High-Quality Distractors Generation for Human Exam Based on Reinforcement Learning from Preference Feedback

Wang, Ruofan; Jiang, Yuru; Tao, Yuyang; Li, Mengyuan; Wang, Xia; Ge, Shili

doi:10.1007/978-981-97-9440-9_8

Ruofan Wang¹⁰,
Yuru Jiang¹⁰,
Yuyang Tao¹⁰,
Mengyuan Li¹⁰,
Xia Wang¹¹ &
…
Shili Ge¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 15362))

Included in the following conference series:

CCF International Conference on Natural Language Processing and Chinese Computing

240 Accesses

Abstract

Distractors are incorrect answer options designed to mislead or confuse test-takers in multiple-choice reading comprehension questions. In real-world exam settings, creating distractors for English reading comprehension questions is complex and varied, with subjective and diverse evaluation standards. Developing a distractor generation technique that meets real-world requirements is a highly challenging task with significant research value. To address these challenges, we introduce DGRL (Distractors Generation based on Reinforcement Learning from preference feedback), a method using cutting-edge large language models trained through reinforcement learning to generate multiple distractors for real-world human exam. First, the distractor generation model is fine-tuned through supervised fine-tuning (SFT) on a reading comprehension question dataset. Then, using preference feedback reinforcement learning, we build and train a reward model to evaluate the quality of individual distractors. Combining the reward model with a diversity evaluation metric, we design an objective function and further train the fine-tuned model using reinforcement learning. Experiments show that the DGRL, after SFT and reinforcement learning, can generate multiple high-quality distractors that meet the real-world requirements in one go, serving as a valuable reference and aid in real-world question-setting for human exam.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 74.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Qadg: Generating question–answer-distractors pairs for real examination

Article 18 November 2024

QDG: A unified model for automatic question-distractor pairs generation

Article 25 July 2022

Improving the Validity of Automatically Generated Feedback via Reinforcement Learning

References

Maurya, K.K., Desarkar, M.S.: Learning to distract: a hierarchical multi-decoder network for automated generation of long distractors for multiple-choice questions for reading comprehension. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 1115–1124 (2020)
Google Scholar
Xie, J., Peng, N., Cai, Y., Wang, T., Huang, Q.: Diverse distractor generation for constructing high-quality multiple choice questions. IEEE/ACM Trans. Audio Speech Lang. Process. 30, 280–291 (2021)
Article Google Scholar
Gao, Y., Bing, L., Li, P., King, I., Lyu, M.R.: Generating distractors for reading comprehension questions from real examinations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 6423–6430 (2019)
Google Scholar
Zhou, X., Luo, S., Wu, Y.: Co-attention hierarchical network: generating coherent long distractors for reading comprehension. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 9725–9732 (2020)
Google Scholar
Qiu, Z., Wu, X., Fan, W.: Automatic distractor generation for multiple choice questions in standard tests. arXiv preprint arXiv:2011.13100 (2020)
Shuai, P., Wei, Z., Liu, S., Xu, X., Li, L.: Topic enhanced multi-head co-attention: Generating distractors for reading comprehension. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2021)
Google Scholar
Shuai, P., Li, L., Liu, S., Shen, J.: Qdg: a unified model for automatic question-distractor pairs generation. Appl. Intell. 53(7), 8275–8285 (2023)
Article Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inform. Process. Syst. 30 (2017)
Google Scholar
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(140), 1–67 (2020)
MathSciNet Google Scholar
Lewis, M., et al.: Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461 (2019)
Schulman, J., Zoph, B., Kim, C., et al.: Introducing chatgpt. https://openai.com/blog/chatgpt
Touvron, H., et al.: Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)
Touvron, H., et al.: Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023)
Du, Z., Qian, Y., Liu, X., Ding, M., Qiu, J., Yang, Z., Tang, J.: Glm: General language model pretraining with autoregressive blank infilling. arXiv preprint arXiv:2103.10360 (2021)
Wei, J., et al.: Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural. Inf. Process. Syst. 35, 24824–24837 (2022)
Google Scholar
Ouyang, L., et al.: Training language models to follow instructions with human feedback. Adv. Neural. Inf. Process. Syst. 35, 27730–27744 (2022)
Google Scholar
Achiam, J., et al.: Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Jia, X., Zhou, W., Sun, X., Wu, Y.: Eqg-race: Examination-type question generation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 13143–13151 (2021)
Google Scholar
Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.: Race: Large-scale reading comprehension dataset from examinations. arXiv preprint arXiv:1704.04683 (2017)
Li, J., Galley, M., Brockett, C., Gao, J., Dolan, B.: A diversity-promoting objective function for neural conversation models. arXiv preprint arXiv:1510.03055 (2015)
Zhang, S., et al.: Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068 (2022)
Hu, E.J., et al.: Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp. 311–318 (2002)
Google Scholar
Rouge, L.C.: A package for automatic evaluation of summaries. In: Proceedings of Workshop on Text Summarization of ACL, Spain, vol. 5 (2004)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

Download references

Acknowledgments

This study was funded by the Natural Science Foundation of Beijing, China(No.4242019), Beijing Natural Science Foundation-Xiaomi Innovation Joint Fund, China(L233008).

Author information

Authors and Affiliations

Intelligent Information Processing Institute, Beijing Information Science and Technology University, Beijing, 100192, China
Ruofan Wang, Yuru Jiang, Yuyang Tao & Mengyuan Li
School of Foreign Studies, Beijing Information Science and Technology University, Beijing, 100192, China
Xia Wang
Laboratory of Language and Artificial Intelligence, Guangdong University of Foreign Studies, Guangzhou, 510420, China
Shili Ge

Authors

Ruofan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yuru Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Yuyang Tao
View author publications
You can also search for this author in PubMed Google Scholar
Mengyuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Xia Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shili Ge
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuru Jiang .

Editor information

Editors and Affiliations

University of Macau, Macao, China
Derek F. Wong
Fudan University, Shanghai, China
Zhongyu Wei
Harbin Institute of Technology, Harbin, China
Muyun Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, R., Jiang, Y., Tao, Y., Li, M., Wang, X., Ge, S. (2025). High-Quality Distractors Generation for Human Exam Based on Reinforcement Learning from Preference Feedback. In: Wong, D.F., Wei, Z., Yang, M. (eds) Natural Language Processing and Chinese Computing. NLPCC 2024. Lecture Notes in Computer Science(), vol 15362. Springer, Singapore. https://doi.org/10.1007/978-981-97-9440-9_8

Download citation

DOI: https://doi.org/10.1007/978-981-97-9440-9_8
Published: 01 November 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-9439-3
Online ISBN: 978-981-97-9440-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)

High-Quality Distractors Generation for Human Exam Based on Reinforcement Learning from Preference Feedback