Weak Positive Sampling and Soft Smooth Labeling for Distractor Generation Data Augmentation

Wang, Jiayun; Bai, Jun; Rong, Wenge; Ouyang, Yuanxin; Xiong, Zhang

doi:10.1007/978-981-99-4752-2_62

Jiayun Wang¹³,
Jun Bai¹³,
Wenge Rong¹³,
Yuanxin Ouyang¹³ &
…
Zhang Xiong¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14089))

Included in the following conference series:

International Conference on Intelligent Computing

948 Accesses

Abstract

Distractor generation is one of the most important and challenging tasks in the automatic generation of multiple choice questions. Previous studies usually use a few ground truth distractors as training samples, which ignores more potential usable distractors, where the strong generation ability of deep learning models might not be fully utilized. Therefore, we propose a data augmentation framework for distractor generation, which first applies the distractor ranking model on a distractor candidate set and then selects useful distractor candidates as additional training samples. Besides, we propose weak positive sampling and soft smooth labeling mechanism to ensure the sample quality and effectively use samples during the training stage. Experimental results on public benchmarks demonstrate the effectiveness of our proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Gao, Y., Bing, L., Li, P., King, I., Lyu, M.R.: Generating distractors for reading comprehension questions from real examinations. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 6423–6430 (2019)
Google Scholar
Chung, H.L., Chan, Y.H., Fan, Y.C.: A bert-based distractor generation scheme with multi-tasking and negative answer training strategies. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 4390–4400 (2020)
Google Scholar
Zhou, X., Luo, S., Wu, Y.: Co-attention hierarchical network: generating coherent long distractors for reading comprehension. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 9725–9732 (2020)
Google Scholar
Qiu, Z., Wu, X., Fan, W.: Automatic distractor generation for multiple choice questions in standard tests. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 2096–2106 (2020)
Google Scholar
Maurya, K.K., Desarkar, M.S.: Learning to distract: a hierarchical multi-decoder network for automated generation of long distractors for multiple-choice questions for reading comprehension. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 1115–1124 (2020)
Google Scholar
Liang, C., Yang, X., Dave, N., Wham, D., Pursel, B., Giles, C.L.: Distractor generation for multiple choice questions using learning to rank. In: Proceedings of the 13th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 284–290 (2018)
Google Scholar
Sinha, M., Dasgupta, T., Mandav, J.: Ranking multiple choice question distractors using semantically informed neural networks. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 3329–3332 (2020)
Google Scholar
Karpukhin, V., et al.: Dense passage retrieval for open-domain question answering. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pp. 6769–6781 (2020)
Google Scholar
Hofstätter, S., Lin, S.C., Yang, J.H., Lin, J., Hanbury, A.: Efficiently teaching an effective dense retriever with balanced topic aware sampling. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 113–122 (2021)
Google Scholar
Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.: Race: large-scale reading comprehension dataset from examinations. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 785–794 (2017)
Google Scholar
Richardson, M., Burges, C.J., Renshaw, E.: Mctest: A challenge dataset for the open-domain machine comprehension of text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 193–203 (2013)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)
Google Scholar
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., Zettlemoyer, L.: Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7871–7880 (2020)
Google Scholar
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(1), 5485–5551 (2020)
MathSciNet MATH Google Scholar

Download references

Acknowledgment

This work was partially supported by the National Natural Science Foundation of China (No. 61977002).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Beihang University, Beijing, China
Jiayun Wang, Jun Bai, Wenge Rong, Yuanxin Ouyang & Zhang Xiong

Authors

Jiayun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jun Bai
View author publications
You can also search for this author in PubMed Google Scholar
Wenge Rong
View author publications
You can also search for this author in PubMed Google Scholar
Yuanxin Ouyang
View author publications
You can also search for this author in PubMed Google Scholar
Zhang Xiong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenge Rong .

Editor information

Editors and Affiliations

Department of Computer Science, Eastern Institute of Technology, Zhejiang, China
De-Shuang Huang
University of Wollongong, North Wollongong, NSW, Australia
Prashan Premaratne
Zhengzhou University of Light Industry, Zhengzhou, China
Baohua Jin
Zhong Yuan University of Technology, Zhengzhou, China
Boyang Qu
University of Ulsan, Ulsan, Korea (Republic of)
Kang-Hyun Jo
Department of Computer Science, Liverpool John Moores University, Liverpool, UK
Abir Hussain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, J., Bai, J., Rong, W., Ouyang, Y., Xiong, Z. (2023). Weak Positive Sampling and Soft Smooth Labeling for Distractor Generation Data Augmentation. In: Huang, DS., Premaratne, P., Jin, B., Qu, B., Jo, KH., Hussain, A. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2023. Lecture Notes in Computer Science(), vol 14089. Springer, Singapore. https://doi.org/10.1007/978-981-99-4752-2_62

Download citation

DOI: https://doi.org/10.1007/978-981-99-4752-2_62
Published: 31 July 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-4751-5
Online ISBN: 978-981-99-4752-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics