Abstract
Distractors are incorrect answer options designed to mislead or confuse test-takers in multiple-choice reading comprehension questions. In real-world exam settings, creating distractors for English reading comprehension questions is complex and varied, with subjective and diverse evaluation standards. Developing a distractor generation technique that meets real-world requirements is a highly challenging task with significant research value. To address these challenges, we introduce DGRL (Distractors Generation based on Reinforcement Learning from preference feedback), a method using cutting-edge large language models trained through reinforcement learning to generate multiple distractors for real-world human exam. First, the distractor generation model is fine-tuned through supervised fine-tuning (SFT) on a reading comprehension question dataset. Then, using preference feedback reinforcement learning, we build and train a reward model to evaluate the quality of individual distractors. Combining the reward model with a diversity evaluation metric, we design an objective function and further train the fine-tuned model using reinforcement learning. Experiments show that the DGRL, after SFT and reinforcement learning, can generate multiple high-quality distractors that meet the real-world requirements in one go, serving as a valuable reference and aid in real-world question-setting for human exam.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Maurya, K.K., Desarkar, M.S.: Learning to distract: a hierarchical multi-decoder network for automated generation of long distractors for multiple-choice questions for reading comprehension. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 1115–1124 (2020)
Xie, J., Peng, N., Cai, Y., Wang, T., Huang, Q.: Diverse distractor generation for constructing high-quality multiple choice questions. IEEE/ACM Trans. Audio Speech Lang. Process. 30, 280–291 (2021)
Gao, Y., Bing, L., Li, P., King, I., Lyu, M.R.: Generating distractors for reading comprehension questions from real examinations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 6423–6430 (2019)
Zhou, X., Luo, S., Wu, Y.: Co-attention hierarchical network: generating coherent long distractors for reading comprehension. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 9725–9732 (2020)
Qiu, Z., Wu, X., Fan, W.: Automatic distractor generation for multiple choice questions in standard tests. arXiv preprint arXiv:2011.13100 (2020)
Shuai, P., Wei, Z., Liu, S., Xu, X., Li, L.: Topic enhanced multi-head co-attention: Generating distractors for reading comprehension. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2021)
Shuai, P., Li, L., Liu, S., Shen, J.: Qdg: a unified model for automatic question-distractor pairs generation. Appl. Intell. 53(7), 8275–8285 (2023)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inform. Process. Syst. 30 (2017)
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(140), 1–67 (2020)
Lewis, M., et al.: Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461 (2019)
Schulman, J., Zoph, B., Kim, C., et al.: Introducing chatgpt. https://openai.com/blog/chatgpt
Touvron, H., et al.: Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)
Touvron, H., et al.: Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023)
Du, Z., Qian, Y., Liu, X., Ding, M., Qiu, J., Yang, Z., Tang, J.: Glm: General language model pretraining with autoregressive blank infilling. arXiv preprint arXiv:2103.10360 (2021)
Wei, J., et al.: Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural. Inf. Process. Syst. 35, 24824–24837 (2022)
Ouyang, L., et al.: Training language models to follow instructions with human feedback. Adv. Neural. Inf. Process. Syst. 35, 27730–27744 (2022)
Achiam, J., et al.: Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Jia, X., Zhou, W., Sun, X., Wu, Y.: Eqg-race: Examination-type question generation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 13143–13151 (2021)
Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.: Race: Large-scale reading comprehension dataset from examinations. arXiv preprint arXiv:1704.04683 (2017)
Li, J., Galley, M., Brockett, C., Gao, J., Dolan, B.: A diversity-promoting objective function for neural conversation models. arXiv preprint arXiv:1510.03055 (2015)
Zhang, S., et al.: Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068 (2022)
Hu, E.J., et al.: Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp. 311–318 (2002)
Rouge, L.C.: A package for automatic evaluation of summaries. In: Proceedings of Workshop on Text Summarization of ACL, Spain, vol. 5 (2004)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Acknowledgments
This study was funded by the Natural Science Foundation of Beijing, China(No.4242019), Beijing Natural Science Foundation-Xiaomi Innovation Joint Fund, China(L233008).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wang, R., Jiang, Y., Tao, Y., Li, M., Wang, X., Ge, S. (2025). High-Quality Distractors Generation for Human Exam Based on Reinforcement Learning from Preference Feedback. In: Wong, D.F., Wei, Z., Yang, M. (eds) Natural Language Processing and Chinese Computing. NLPCC 2024. Lecture Notes in Computer Science(), vol 15362. Springer, Singapore. https://doi.org/10.1007/978-981-97-9440-9_8
Download citation
DOI: https://doi.org/10.1007/978-981-97-9440-9_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-9439-3
Online ISBN: 978-981-97-9440-9
eBook Packages: Computer ScienceComputer Science (R0)