Abstract
In this paper, we offer a corpus of question answer pairs related to the TV series generated from paragraph contexts. The data set called GameofThronesQA V1.0 contains 5237 unique question answer pairs from the Game Of Thrones TV series across the eight seasons. In particular, we provide a pipeline approach for answer aware question generation, where the answers are extracted based on the named entities from the TV series. This is different to the traditional methods which generate questions first and find the relevant answers later. Furthermore, we provide a comparative analysis of the generated corpus with the benchmark datasets such as SQuAD, TriviaQA, WikiQA and TweetQA. The snapshot of the dataset is provided as an appendix for review purpose and will be released to public later.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Du, X., Cardie, C.: Harvesting paragraph-level question-answer Pairs from Wikipedia. In: Association for Computational Linguistics (ACL) (2018)
Chan, Y.-H., Fan, Y.-C.: A recurrent BERT-based model for question generation. In: Proceedings of the Second Workshop on Machine Reading for Question Answering, pp. 154–162, Hong Kong, China, 4 November 2019. (ACL) (2019)
Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining, Bioinformatics, vol. 36(4), pp. 1234–1240, BioBERT: a pre-trained biomedical language representation model for biomedical text mining, 15 February 2020
Duan, N., Tang, D., Chen, P., Zhou, M.: Question generation for question answering. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 866–874 (2017). http://www.aclweb.org/anthology/D13-1160
Indurthi, S.R., et al.: Generating natural language question-answer pairs from a knowledge graph using a RNN based question generation model. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, vol. 1, Long Papers (2017)
Kwiatkowski, T., et al.: Natural questions: a benchmark for question answering research. Trans. Assoc. Comput. Linguist. 7, 453–466 (2019)
Yang, Y., Yih, W.-T., Meek, C.: WIKIQA: a challenge dataset for open-domain question answering. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (2015)
Cambazoglu, B.B., et al.: A Review of Public Datasets in Question Answering Research (2020)
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683 (2019)
Xiong, W., et al.: TWEETQA: a social media focused question answering dataset. arXiv preprint arXiv:1907.06292 (2019)
Joshi, M., et al.: TriviaQA: a large scale distantly supervised challenge dataset for reading comprehension. arXiv preprint arXiv:1705.03551 (2017)
Bordes, A., Usunier, N., Chopra, S., Weston, J.: Large-scale simple question answering with memory networks. arXiv preprint arXiv:1506.02075 (2015)
Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization (2005)
Chen, D., Fisch, A., Weston, J., Bordes, A.: Reading Wikipedia to answer open domain questions. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, vol. 1: Long Papers. Association for Computational Linguistics, pp. 1870–1879 (2017). https://doi.org/10.18653/v1/P17-1171
Lin, C.-Y.: Rouge: a package for automatic evaluation of summaries. Text summarization branches out (2004)
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Austin, Texas, pp. 2383–2392 (2016). https://aclweb.org/anthology/D16- 1264
Serban, I.V., et al.: Generating factoid questions with recurrent neural networks: the 30 m factoid question-answer corpus. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (vol. 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, pp. 588–598 (2016). http://www.aclweb.org/anthology/P16-1056
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Wang, S.: R3: Reinforced ranker-reader for open-domain question answering (2018)
Yao, X., Bouma, G., Zhang, Y.: Semantics-based question generation and implementation. Dialog. Discourse 3(2), 11–42 (2012)
Zhou, Q., Yang, N., Wei, F., Tan, C., Bao, H., Zhou, M.: Neural question generation from text: a preliminary study. arXiv preprint arXiv:1704.01792 (2017)
Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, pp. 311–318 (2002). https://doi.org/10.3115/1073083.1073135
Winograd, T.: Understanding natural language. Cogn. Psychol. 3(1), 1–191 (1972)
Ryu, P.-M., Jang, M.-G., Kim, H.-K.: Open domain question answering using Wikipedia-based knowledge model. Inf. Process. Manage. 50(5), 683–692 (2014)
Hermann, K.M., Kocisky, T., Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M., Blunsom, P.: Teaching machines to read and comprehend. In: Advances in Neural Information Processing Systems (NIPS) (2015)
Acknowledgments
This study was supported in part by the Discovery and CREATE grants from the Natural Sciences and Engineering Research Council (NSERC) of Canada.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
5 Appendix
5 Appendix
Some sample QA pairs in our corpus is given below for users to review and get an understanding of our generated dataset. The generated 5237 QA pairs are all unique, although there may be some answers which are same for different questions.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Lahiri, A.K., Hu, Q.V. (2022). GameOfThronesQA: Answer-Aware Question-Answer Pairs for TV Series. In: Hagen, M., et al. Advances in Information Retrieval. ECIR 2022. Lecture Notes in Computer Science, vol 13186. Springer, Cham. https://doi.org/10.1007/978-3-030-99739-7_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-99739-7_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99738-0
Online ISBN: 978-3-030-99739-7
eBook Packages: Computer ScienceComputer Science (R0)