Skip to main content

ThinkTwice: A Two-Stage Method for Long-Text Machine Reading Comprehension

  • Conference paper
  • First Online:
Natural Language Processing and Chinese Computing (NLPCC 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13028))

Abstract

Long-text machine reading comprehension (LT-MRC) requires machine to answer questions based on a lengthy text. Despite transformer-based models achieve promising results, most of them are incapable of dealing with long sequences for their time-consuming. In general, a proper solution by sliding window splits the passage into equally spaced fragments, then predicts the answer based on each fragment separately without considering other contextual fragments. However, this approach suffers from lack of long-distance dependency, which severely damages the performance. To address this issue, we propose a two-stage method ThinkTwice for LT-MRC. ThinkTwice casts the process of LT-MRC into two main steps: 1) it firstly retrieves several fragments that the final answer is most likely to lie in; 2) then extracts the answer span from these fragments instead of from the lengthy document. We do experiments on NewsQA. The experimental results demonstrate that ThinkTwice can capture the most informative fragments from a long text. Meanwhile, ThinkTwice achieves considerable improvements compared to all existing baselines. All codes have been released at Github (https://github.com/Walle1493/ThinkTwice).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Such as the max position embedding length of BERT is 512.

  2. 2.

    [CLS] and [SEP] are special tokens. The former can theoretically represent the overall information of the whole input sequence after being encoded, and the latter is used for input segmentation.

References

  1. Atkinson, R.C., Shiffrin, R.M.: Human memory: a proposed system and its control processes. In: Psychology of Learning and Motivation, vol. 2, pp. 89–195. Elsevier (1968)

    Google Scholar 

  2. Beltagy, I., Peters, M.E., Cohan, A.: Longformer: the long-document transformer. arXiv preprint arXiv:2004.05150 (2020)

  3. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  4. Ding, M., Zhou, C., Yang, H., Tang, J.: CogLTX: applying BERT to long texts. In: Advances in Neural Information Processing Systems, vol. 33 (2020)

    Google Scholar 

  5. Ding, S., et al.: ERNIE-Doc: the retrospective long-document modeling transformer. arXiv preprint arXiv:2012.15688 (2020)

  6. Hermann, K.M., et al.: Teaching machines to read and comprehend. arXiv preprint arXiv:1506.03340 (2015)

  7. Joshi, M., Chen, D., Liu, Y., Weld, D.S., Zettlemoyer, L., Levy, O.: Spanbert: improving pre-training by representing and predicting spans. Trans. Assoc. Comput. Linguist. 8, 64–77 (2020)

    Article  Google Scholar 

  8. Joshi, M., Choi, E., Weld, D.S., Zettlemoyer, L.: TriviaQA: a large scale distantly supervised challenge dataset for reading comprehension. arXiv preprint arXiv:1705.03551 (2017)

  9. Joshi, M., Levy, O., Weld, D.S., Zettlemoyer, L.: BERT for coreference resolution: baselines and analysis. arXiv preprint arXiv:1908.09091 (2019)

  10. Kryściński, W., Keskar, N.S., McCann, B., Xiong, C., Socher, R.: Neural text summarization: a critical evaluation. arXiv preprint arXiv:1908.08960 (2019)

  11. Kundu, S., Ng, H.T.: A question-focused multi-factor attention network for question answering. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

    Google Scholar 

  12. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: Albert: a lite BERT for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019)

  13. Liu, Y., et al.: Roberta: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

  14. Mikolov, T., Karafiát, M., Burget, L., Černockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: Eleventh Annual Conference of the International Speech Communication Association (2010)

    Google Scholar 

  15. Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250 (2016)

  16. Seo, M., Kembhavi, A., Farhadi, A., Hajishirzi, H.: Bidirectional attention flow for machine comprehension. arXiv preprint arXiv:1611.01603 (2016)

  17. Tay, Y., Tuan, L.A., Hui, S.C., Su, J.: Densely connected attention propagation for reading comprehension. arXiv preprint arXiv:1811.04210 (2018)

  18. Trischler, A., et al.: NewsQA: a machine comprehension dataset. arXiv preprint arXiv:1611.09830 (2016)

  19. Vaswani, A., et al.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)

  20. Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. arXiv preprint arXiv:1506.03134 (2015)

  21. Wang, S., Jiang, J.: Learning natural language inference with LSTM. arXiv preprint arXiv:1512.08849 (2015)

  22. Wang, Z., Ng, P., Ma, X., Nallapati, R., Xiang, B.: Multi-passage BERT: a globally normalized BERT model for open-domain question answering. arXiv preprint arXiv:1908.08167 (2019)

  23. Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)

  24. Xie, Q., Dai, Z., Hovy, E., Luong, M.T., Le, Q.V.: Unsupervised data augmentation for consistency training. arXiv preprint arXiv:1904.12848 (2019)

  25. Yang, Z: HotpotQA: a dataset for diverse, explainable multi-hop question answering. arXiv preprint arXiv:1809.09600 (2018)

  26. Zaheer, M., et al.: Big bird: transformers for longer sequences. arXiv preprint arXiv:2007.14062 (2020)

  27. Zhang, Z., Yang, J., Zhao, H.: Retrospective reader for machine reading comprehension. arXiv preprint arXiv:2001.09694 (2020)

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dong, M., Zou, B., Qian, J., Huang, R., Hong, Y. (2021). ThinkTwice: A Two-Stage Method for Long-Text Machine Reading Comprehension. In: Wang, L., Feng, Y., Hong, Y., He, R. (eds) Natural Language Processing and Chinese Computing. NLPCC 2021. Lecture Notes in Computer Science(), vol 13028. Springer, Cham. https://doi.org/10.1007/978-3-030-88480-2_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-88480-2_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-88479-6

  • Online ISBN: 978-3-030-88480-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics