Skip to main content

FairytaleQA Translated: Enabling Educational Question and Answer Generation in Less-Resourced Languages

  • Conference paper
  • First Online:
Technology Enhanced Learning for Inclusive and Equitable Quality Education (EC-TEL 2024)

Abstract

Question Answering (QA) datasets are crucial in assessing reading comprehension skills for both machines and humans. While numerous datasets have been developed in English for this purpose, a noticeable void exists in less-resourced languages. To alleviate this gap, our paper introduces machine-translated versions of FairytaleQA, a renowned QA dataset designed to assess and enhance narrative comprehension skills in young children. By employing fine-tuned, modest-scale models, we establish benchmarks for both Question Generation (QG) and QA tasks within the translated datasets. In addition, we present a case study proposing a model for generating question-answer pairs, with an evaluation incorporating quality metrics such as question well-formedness, answerability, relevance, and children suitability. Our evaluation prioritizes quantifying and describing error cases, along with providing directions for future work. This paper contributes to the advancement of QA and QG research in less-resourced languages, promoting accessibility and inclusivity in the development of these models for reading comprehension. The code and data is publicly available at github.com/bernardoleite/fairytaleqa-translated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Henceforth, we use “QA dataset” to denote datasets comprising questions and answers, and “QA task” to refer to the computational task of answering a question.

  2. 2.

    We also have included translated datasets for Italian and Romanian in our repository, although they were not studied in this research.

  3. 3.

    Made available at github.com/bernardoleite/fairytaleqa-translated.

  4. 4.

    Made available at github.com/bernardoleite/fairytaleqa-translated.

  5. 5.

    QA datasets are typically also used for QG.

  6. 6.

    It is worth noting recent efforts to make these models more accessible, for instance through quantization techniques.

  7. 7.

    https://www.deepl.com/translator.

  8. 8.

    https://huggingface.co/t5-base.

  9. 9.

    While the difference is statistically significant, the effect size is small (Cohen’s d < 0.19).

  10. 10.

    The 5 questions were randomly selected from a pool of 7, corresponding to the 7 narrative elements that can be controlled.

  11. 11.

    We prompt GPT-4 Turbo to generate QA pairs from the narrative text in the target language, tailored for children aged 7 to 10, aligned with the FairytaleQA’s audience.

  12. 12.

    https://iave.pt/provas-e-exames/provas-e-exames/provas-de-afericao-eb/.

  13. 13.

    In 4% of the cases, votes were tied, leading us to seek an additional volunteer to break the tie.

  14. 14.

    The following examples have been translated from Portuguese to English.

References

  1. Alonzo, J., Basaraba, D., Tindal, G., Carriveau, R.S.: They read, but how well do they understand? An empirical look at the nuances of measuring reading comprehension. Assess. Eff. Interv. 35(1), 34–44 (2009)

    Google Scholar 

  2. Araujo, V., Trusca, M.M., Tufiño, R., Moens, M.F.: Sequence-to-sequence spanish pre-trained language models. arXiv preprint arXiv:2309.11259 (2023)

  3. Brown, T., et al.: Language models are few-shot learners. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901. Curran Associates, Inc. (2020)

    Google Scholar 

  4. Carmo, D., Piau, M., Campiotti, I., Nogueira, R., Lotufo, R.: PTT5: pretraining and validating the T5 model on Brazilian Portuguese data. arXiv preprint arXiv:2008.09144 (2020)

  5. Carrino, C.P., Costa-jussà, M.R., Fonollosa, J.A.R.: Automatic Spanish translation of SQuAD dataset for multi-lingual question answering. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 5515–5523. European Language Resources Association, Marseille, France (2020). https://aclanthology.org/2020.lrec-1.677

  6. Dublineau, J.: T5 question generation and question answering (2020). https://huggingface.co/JDBN/t5-base-fr-qg-fquad

  7. Elkins, S., Kochmar, E., Serban, I., Cheung, J.C.K.: How useful are educational questions generated by large language models? In: Wang, N., Rebolledo-Mendez, G., Dimitrova, V., Matsuda, N., Santos, O.C. (eds.) Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium and Blue Sky. AIED 2023. Communications in Computer and Information Science, vol. 1831, pp. 536–542. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-36336-8_83

  8. Ghanem, B., Lutz Coleman, L., Rivard Dexter, J., von der Ohe, S., Fyshe, A.: Question generation for reading comprehension assessment by modeling how and what to ask. In: Findings of the Association for Computational Linguistics: ACL 2022, pp. 2131–2146. Association for Computational Linguistics, Dublin, Ireland (2022). https://doi.org/10.18653/v1/2022.findings-acl.168

  9. Kurdi, G., Leo, J., Parsia, B., Sattler, U., Al-Emari, S.: A systematic review of automatic question generation for educational purposes. Int. J. Artif. Intell. Educ. 30(1), 121–204 (2020)

    Article  Google Scholar 

  10. Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.: RACE: large-scale reading comprehension dataset from examinations. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017).https://doi.org/10.18653/v1/D17-1082

  11. Leite., B., Cardoso., H.: On few-shot prompting for controllable question-answer generation in narrative comprehension. In: Proceedings of the 16th International Conference on Computer Supported Education - Volume 2: CSEDU, pp. 63–74. SciTePress, INSTICC (2024). https://doi.org/10.5220/0012623800003693

  12. Leite, B., Cardoso, H.L.: Towards enriched controllability for educational question generation. In: Wang, N., Rebolledo-Mendez, G., Matsuda, N., Santos, O.C., Dimitrova, V. (eds.) Artificial Intelligence in Education, pp. 786–791. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-36272-9_72

    Chapter  Google Scholar 

  13. Leite, B., Lopes Cardoso, H.: Neural question generation for the Portuguese language: a preliminary study. In: Marreiros, G., Martins, B., Paiva, A., Ribeiro, B., Sardinha, A. (eds.) Progress in Artificial Intelligence, pp. 780–793. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16474-3_63

    Chapter  Google Scholar 

  14. Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7871–7880. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.acl-main.703

  15. Lynch, J.S., Van Den Broek, P., Kremer, K.E., Kendeou, P., White, M.J., Lorch, E.P.: The development of narrative comprehension and its relation to other early reading skills. Read. Psychol. 29(4), 327–365 (2008)

    Article  Google Scholar 

  16. Paris, A.H., Paris, S.G.: Assessing narrative comprehension in young children. Read. Res. Q. 38(1), 36–76 (2003)

    Article  MathSciNet  Google Scholar 

  17. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(140), 1–67 (2020)

    Google Scholar 

  18. Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2383–2392. ACL, Austin, Texas (2016). https://doi.org/10.18653/v1/D16-1264

  19. Staš, J., Hládek, D., Koctúr, T.: Slovak question answering dataset based on the machine translation of the squad v2. 0. J. Linguist. /Jazykovednỳ casopis 74(1), 381–390 (2023)

    Google Scholar 

  20. Ushio, A., Alva-Manchego, F., Camacho-Collados, J.: Generative language models for paragraph-level question generation. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 670–688. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates (2022)

    Google Scholar 

  21. Xie, Q., Lai, G., Dai, Z., Hovy, E.: Large-scale cloze test dataset created by teachers. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2344–2356. Association for Computational Linguistics, Brussels, Belgium (2018) https://doi.org/10.18653/v1/D18-1257

  22. Xu, Y., et al.: Fantastic questions and where to find them: FairytaleQA – an authentic dataset for narrative comprehension. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Volume 1: Long Papers, pp. 447–460. Association for Computational Linguistics, Dublin, Ireland (2022). https://doi.org/10.18653/v1/2022.acl-long.34

  23. Zhao, Z., Hou, Y., Wang, D., Yu, M., Liu, C., Ma, X.: Educational question generation of children storybooks via question type distribution learning and event-centric summarization. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Volume 1: Long Papers, pp. 5073–5085. Association for Computational Linguistics, Dublin, Ireland (2022). https://doi.org/10.18653/v1/2022.acl-long.348

Download references

Acknowledgments

The authors thank the 15 participants who voluntarily participated in the evaluation process. We would like to specifically acknowledge those who agreed to share their names: Ângela Esteves, Bruno Miguel Pinto, David Reis, Diana Pinto, Maria de Abreu, Mariana Coelho, Pedro Costa, Rui Leixo and Vítor Magalhães. This work was financially supported by Base Funding - UIDB/00027/2020 and Programatic Funding - UIDP/00027/2020 of the Artificial Intelligence and Computer Science Laboratory - LIACC - funded by national funds through the FCT/MCTES (PIDDAC). Bernardo Leite is supported by a PhD studentship (with reference 2021.05432.BD), funded by Fundação para a Ciência e a Tecnologia (FCT).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bernardo Leite .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Leite, B., Osório, T.F., Cardoso, H.L. (2024). FairytaleQA Translated: Enabling Educational Question and Answer Generation in Less-Resourced Languages. In: Ferreira Mello, R., Rummel, N., Jivet, I., Pishtari, G., Ruipérez Valiente, J.A. (eds) Technology Enhanced Learning for Inclusive and Equitable Quality Education. EC-TEL 2024. Lecture Notes in Computer Science, vol 15159. Springer, Cham. https://doi.org/10.1007/978-3-031-72315-5_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72315-5_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72314-8

  • Online ISBN: 978-3-031-72315-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics