Skip to main content

High-Quality Distractors Generation for Human Exam Based on Reinforcement Learning from Preference Feedback

  • Conference paper
  • First Online:
Natural Language Processing and Chinese Computing (NLPCC 2024)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 15362))

  • 240 Accesses

Abstract

Distractors are incorrect answer options designed to mislead or confuse test-takers in multiple-choice reading comprehension questions. In real-world exam settings, creating distractors for English reading comprehension questions is complex and varied, with subjective and diverse evaluation standards. Developing a distractor generation technique that meets real-world requirements is a highly challenging task with significant research value. To address these challenges, we introduce DGRL (Distractors Generation based on Reinforcement Learning from preference feedback), a method using cutting-edge large language models trained through reinforcement learning to generate multiple distractors for real-world human exam. First, the distractor generation model is fine-tuned through supervised fine-tuning (SFT) on a reading comprehension question dataset. Then, using preference feedback reinforcement learning, we build and train a reward model to evaluate the quality of individual distractors. Combining the reward model with a diversity evaluation metric, we design an objective function and further train the fine-tuned model using reinforcement learning. Experiments show that the DGRL, after SFT and reinforcement learning, can generate multiple high-quality distractors that meet the real-world requirements in one go, serving as a valuable reference and aid in real-world question-setting for human exam.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Maurya, K.K., Desarkar, M.S.: Learning to distract: a hierarchical multi-decoder network for automated generation of long distractors for multiple-choice questions for reading comprehension. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 1115–1124 (2020)

    Google Scholar 

  2. Xie, J., Peng, N., Cai, Y., Wang, T., Huang, Q.: Diverse distractor generation for constructing high-quality multiple choice questions. IEEE/ACM Trans. Audio Speech Lang. Process. 30, 280–291 (2021)

    Article  Google Scholar 

  3. Gao, Y., Bing, L., Li, P., King, I., Lyu, M.R.: Generating distractors for reading comprehension questions from real examinations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 6423–6430 (2019)

    Google Scholar 

  4. Zhou, X., Luo, S., Wu, Y.: Co-attention hierarchical network: generating coherent long distractors for reading comprehension. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 9725–9732 (2020)

    Google Scholar 

  5. Qiu, Z., Wu, X., Fan, W.: Automatic distractor generation for multiple choice questions in standard tests. arXiv preprint arXiv:2011.13100 (2020)

  6. Shuai, P., Wei, Z., Liu, S., Xu, X., Li, L.: Topic enhanced multi-head co-attention: Generating distractors for reading comprehension. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2021)

    Google Scholar 

  7. Shuai, P., Li, L., Liu, S., Shen, J.: Qdg: a unified model for automatic question-distractor pairs generation. Appl. Intell. 53(7), 8275–8285 (2023)

    Article  Google Scholar 

  8. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  9. Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)

  10. Vaswani, A., et al.: Attention is all you need. Adv. Neural Inform. Process. Syst. 30 (2017)

    Google Scholar 

  11. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(140), 1–67 (2020)

    MathSciNet  Google Scholar 

  12. Lewis, M., et al.: Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461 (2019)

  13. Schulman, J., Zoph, B., Kim, C., et al.: Introducing chatgpt. https://openai.com/blog/chatgpt

  14. Touvron, H., et al.: Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)

  15. Touvron, H., et al.: Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023)

  16. Du, Z., Qian, Y., Liu, X., Ding, M., Qiu, J., Yang, Z., Tang, J.: Glm: General language model pretraining with autoregressive blank infilling. arXiv preprint arXiv:2103.10360 (2021)

  17. Wei, J., et al.: Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural. Inf. Process. Syst. 35, 24824–24837 (2022)

    Google Scholar 

  18. Ouyang, L., et al.: Training language models to follow instructions with human feedback. Adv. Neural. Inf. Process. Syst. 35, 27730–27744 (2022)

    Google Scholar 

  19. Achiam, J., et al.: Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)

  20. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

  21. Jia, X., Zhou, W., Sun, X., Wu, Y.: Eqg-race: Examination-type question generation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 13143–13151 (2021)

    Google Scholar 

  22. Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.: Race: Large-scale reading comprehension dataset from examinations. arXiv preprint arXiv:1704.04683 (2017)

  23. Li, J., Galley, M., Brockett, C., Gao, J., Dolan, B.: A diversity-promoting objective function for neural conversation models. arXiv preprint arXiv:1510.03055 (2015)

  24. Zhang, S., et al.: Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068 (2022)

  25. Hu, E.J., et al.: Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)

  26. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp. 311–318 (2002)

    Google Scholar 

  27. Rouge, L.C.: A package for automatic evaluation of summaries. In: Proceedings of Workshop on Text Summarization of ACL, Spain, vol. 5 (2004)

    Google Scholar 

  28. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

Download references

Acknowledgments

This study was funded by the Natural Science Foundation of Beijing, China(No.4242019), Beijing Natural Science Foundation-Xiaomi Innovation Joint Fund, China(L233008).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuru Jiang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, R., Jiang, Y., Tao, Y., Li, M., Wang, X., Ge, S. (2025). High-Quality Distractors Generation for Human Exam Based on Reinforcement Learning from Preference Feedback. In: Wong, D.F., Wei, Z., Yang, M. (eds) Natural Language Processing and Chinese Computing. NLPCC 2024. Lecture Notes in Computer Science(), vol 15362. Springer, Singapore. https://doi.org/10.1007/978-981-97-9440-9_8

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-9440-9_8

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-9439-3

  • Online ISBN: 978-981-97-9440-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics