Abstract
Long-text machine reading comprehension (LT-MRC) requires machine to answer questions based on a lengthy text. Despite transformer-based models achieve promising results, most of them are incapable of dealing with long sequences for their time-consuming. In general, a proper solution by sliding window splits the passage into equally spaced fragments, then predicts the answer based on each fragment separately without considering other contextual fragments. However, this approach suffers from lack of long-distance dependency, which severely damages the performance. To address this issue, we propose a two-stage method ThinkTwice for LT-MRC. ThinkTwice casts the process of LT-MRC into two main steps: 1) it firstly retrieves several fragments that the final answer is most likely to lie in; 2) then extracts the answer span from these fragments instead of from the lengthy document. We do experiments on NewsQA. The experimental results demonstrate that ThinkTwice can capture the most informative fragments from a long text. Meanwhile, ThinkTwice achieves considerable improvements compared to all existing baselines. All codes have been released at Github (https://github.com/Walle1493/ThinkTwice).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Such as the max position embedding length of BERT is 512.
- 2.
[CLS] and [SEP] are special tokens. The former can theoretically represent the overall information of the whole input sequence after being encoded, and the latter is used for input segmentation.
References
Atkinson, R.C., Shiffrin, R.M.: Human memory: a proposed system and its control processes. In: Psychology of Learning and Motivation, vol. 2, pp. 89–195. Elsevier (1968)
Beltagy, I., Peters, M.E., Cohan, A.: Longformer: the long-document transformer. arXiv preprint arXiv:2004.05150 (2020)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Ding, M., Zhou, C., Yang, H., Tang, J.: CogLTX: applying BERT to long texts. In: Advances in Neural Information Processing Systems, vol. 33 (2020)
Ding, S., et al.: ERNIE-Doc: the retrospective long-document modeling transformer. arXiv preprint arXiv:2012.15688 (2020)
Hermann, K.M., et al.: Teaching machines to read and comprehend. arXiv preprint arXiv:1506.03340 (2015)
Joshi, M., Chen, D., Liu, Y., Weld, D.S., Zettlemoyer, L., Levy, O.: Spanbert: improving pre-training by representing and predicting spans. Trans. Assoc. Comput. Linguist. 8, 64–77 (2020)
Joshi, M., Choi, E., Weld, D.S., Zettlemoyer, L.: TriviaQA: a large scale distantly supervised challenge dataset for reading comprehension. arXiv preprint arXiv:1705.03551 (2017)
Joshi, M., Levy, O., Weld, D.S., Zettlemoyer, L.: BERT for coreference resolution: baselines and analysis. arXiv preprint arXiv:1908.09091 (2019)
Kryściński, W., Keskar, N.S., McCann, B., Xiong, C., Socher, R.: Neural text summarization: a critical evaluation. arXiv preprint arXiv:1908.08960 (2019)
Kundu, S., Ng, H.T.: A question-focused multi-factor attention network for question answering. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: Albert: a lite BERT for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019)
Liu, Y., et al.: Roberta: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Mikolov, T., Karafiát, M., Burget, L., Černockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: Eleventh Annual Conference of the International Speech Communication Association (2010)
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250 (2016)
Seo, M., Kembhavi, A., Farhadi, A., Hajishirzi, H.: Bidirectional attention flow for machine comprehension. arXiv preprint arXiv:1611.01603 (2016)
Tay, Y., Tuan, L.A., Hui, S.C., Su, J.: Densely connected attention propagation for reading comprehension. arXiv preprint arXiv:1811.04210 (2018)
Trischler, A., et al.: NewsQA: a machine comprehension dataset. arXiv preprint arXiv:1611.09830 (2016)
Vaswani, A., et al.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)
Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. arXiv preprint arXiv:1506.03134 (2015)
Wang, S., Jiang, J.: Learning natural language inference with LSTM. arXiv preprint arXiv:1512.08849 (2015)
Wang, Z., Ng, P., Ma, X., Nallapati, R., Xiang, B.: Multi-passage BERT: a globally normalized BERT model for open-domain question answering. arXiv preprint arXiv:1908.08167 (2019)
Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)
Xie, Q., Dai, Z., Hovy, E., Luong, M.T., Le, Q.V.: Unsupervised data augmentation for consistency training. arXiv preprint arXiv:1904.12848 (2019)
Yang, Z: HotpotQA: a dataset for diverse, explainable multi-hop question answering. arXiv preprint arXiv:1809.09600 (2018)
Zaheer, M., et al.: Big bird: transformers for longer sequences. arXiv preprint arXiv:2007.14062 (2020)
Zhang, Z., Yang, J., Zhao, H.: Retrospective reader for machine reading comprehension. arXiv preprint arXiv:2001.09694 (2020)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Dong, M., Zou, B., Qian, J., Huang, R., Hong, Y. (2021). ThinkTwice: A Two-Stage Method for Long-Text Machine Reading Comprehension. In: Wang, L., Feng, Y., Hong, Y., He, R. (eds) Natural Language Processing and Chinese Computing. NLPCC 2021. Lecture Notes in Computer Science(), vol 13028. Springer, Cham. https://doi.org/10.1007/978-3-030-88480-2_34
Download citation
DOI: https://doi.org/10.1007/978-3-030-88480-2_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88479-6
Online ISBN: 978-3-030-88480-2
eBook Packages: Computer ScienceComputer Science (R0)