GameOfThronesQA: Answer-Aware Question-Answer Pairs for TV Series

Lahiri, Aritra Kumar; Hu, Qinmin Vivian

doi:10.1007/978-3-030-99739-7_21

Aritra Kumar Lahiri¹⁵ &
Qinmin Vivian Hu¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13186))

Included in the following conference series:

European Conference on Information Retrieval

2426 Accesses
1 Citations

Abstract

In this paper, we offer a corpus of question answer pairs related to the TV series generated from paragraph contexts. The data set called GameofThronesQA V1.0 contains 5237 unique question answer pairs from the Game Of Thrones TV series across the eight seasons. In particular, we provide a pipeline approach for answer aware question generation, where the answers are extracted based on the named entities from the TV series. This is different to the traditional methods which generate questions first and find the relevant answers later. Furthermore, we provide a comparative analysis of the generated corpus with the benchmark datasets such as SQuAD, TriviaQA, WikiQA and TweetQA. The snapshot of the dataset is provided as an appendix for review purpose and will be released to public later.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Du, X., Cardie, C.: Harvesting paragraph-level question-answer Pairs from Wikipedia. In: Association for Computational Linguistics (ACL) (2018)
Google Scholar
Chan, Y.-H., Fan, Y.-C.: A recurrent BERT-based model for question generation. In: Proceedings of the Second Workshop on Machine Reading for Question Answering, pp. 154–162, Hong Kong, China, 4 November 2019. (ACL) (2019)
Google Scholar
Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining, Bioinformatics, vol. 36(4), pp. 1234–1240, BioBERT: a pre-trained biomedical language representation model for biomedical text mining, 15 February 2020
Google Scholar
https://gameofthrones.fandom.com/wiki/
Duan, N., Tang, D., Chen, P., Zhou, M.: Question generation for question answering. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 866–874 (2017). http://www.aclweb.org/anthology/D13-1160
Indurthi, S.R., et al.: Generating natural language question-answer pairs from a knowledge graph using a RNN based question generation model. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, vol. 1, Long Papers (2017)
Google Scholar
Kwiatkowski, T., et al.: Natural questions: a benchmark for question answering research. Trans. Assoc. Comput. Linguist. 7, 453–466 (2019)
Google Scholar
Yang, Y., Yih, W.-T., Meek, C.: WIKIQA: a challenge dataset for open-domain question answering. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (2015)
Google Scholar
Cambazoglu, B.B., et al.: A Review of Public Datasets in Question Answering Research (2020)
Google Scholar
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683 (2019)
Xiong, W., et al.: TWEETQA: a social media focused question answering dataset. arXiv preprint arXiv:1907.06292 (2019)
Joshi, M., et al.: TriviaQA: a large scale distantly supervised challenge dataset for reading comprehension. arXiv preprint arXiv:1705.03551 (2017)
Bordes, A., Usunier, N., Chopra, S., Weston, J.: Large-scale simple question answering with memory networks. arXiv preprint arXiv:1506.02075 (2015)
Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization (2005)
Google Scholar
Chen, D., Fisch, A., Weston, J., Bordes, A.: Reading Wikipedia to answer open domain questions. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, vol. 1: Long Papers. Association for Computational Linguistics, pp. 1870–1879 (2017). https://doi.org/10.18653/v1/P17-1171
Lin, C.-Y.: Rouge: a package for automatic evaluation of summaries. Text summarization branches out (2004)
Google Scholar
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Austin, Texas, pp. 2383–2392 (2016). https://aclweb.org/anthology/D16- 1264
Serban, I.V., et al.: Generating factoid questions with recurrent neural networks: the 30 m factoid question-answer corpus. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (vol. 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, pp. 588–598 (2016). http://www.aclweb.org/anthology/P16-1056
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Wang, S.: R3: Reinforced ranker-reader for open-domain question answering (2018)
Google Scholar
Yao, X., Bouma, G., Zhang, Y.: Semantics-based question generation and implementation. Dialog. Discourse 3(2), 11–42 (2012)
Article Google Scholar
Zhou, Q., Yang, N., Wei, F., Tan, C., Bao, H., Zhou, M.: Neural question generation from text: a preliminary study. arXiv preprint arXiv:1704.01792 (2017)
Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, pp. 311–318 (2002). https://doi.org/10.3115/1073083.1073135
Winograd, T.: Understanding natural language. Cogn. Psychol. 3(1), 1–191 (1972)
Article Google Scholar
Ryu, P.-M., Jang, M.-G., Kim, H.-K.: Open domain question answering using Wikipedia-based knowledge model. Inf. Process. Manage. 50(5), 683–692 (2014)
Article Google Scholar
Hermann, K.M., Kocisky, T., Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M., Blunsom, P.: Teaching machines to read and comprehend. In: Advances in Neural Information Processing Systems (NIPS) (2015)
Google Scholar
https://huggingface.co/t5-base

Download references

Acknowledgments

This study was supported in part by the Discovery and CREATE grants from the Natural Sciences and Engineering Research Council (NSERC) of Canada.

Author information

Authors and Affiliations

Ryerson University, Toronto, Canada
Aritra Kumar Lahiri & Qinmin Vivian Hu

Authors

Aritra Kumar Lahiri
View author publications
You can also search for this author in PubMed Google Scholar
Qinmin Vivian Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qinmin Vivian Hu .

Editor information

Editors and Affiliations

Martin Luther University Halle-Wittenberg, Halle, Germany
Matthias Hagen
Leiden University, Leiden, The Netherlands
Suzan Verberne
University of Glasgow, Glasgow, UK
Craig Macdonald
University of Duisburg-Essen, Essen, Germany
Christin Seifert
University of Stavanger, Stavanger, Norway
Krisztian Balog
Norwegian University of Science and Technology, Trondheim, Norway
Kjetil Nørvåg
University of Stavanger, Stavanger, Norway
Vinay Setty

5 Appendix

Some sample QA pairs in our corpus is given below for users to review and get an understanding of our generated dataset. The generated 5237 QA pairs are all unique, although there may be some answers which are same for different questions.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lahiri, A.K., Hu, Q.V. (2022). GameOfThronesQA: Answer-Aware Question-Answer Pairs for TV Series. In: Hagen, M., et al. Advances in Information Retrieval. ECIR 2022. Lecture Notes in Computer Science, vol 13186. Springer, Cham. https://doi.org/10.1007/978-3-030-99739-7_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-99739-7_21
Published: 05 April 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99738-0
Online ISBN: 978-3-030-99739-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

GameOfThronesQA: Answer-Aware Question-Answer Pairs for TV Series

Abstract

Access this chapter

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

5 Appendix

5 Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation