Skip to main content

SMART: A Stratified Machine Reading Test

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11838))

Abstract

We present a Stratified MAchine Reading Test (SMART) data set for Chinese in which each question is assigned a “level” that reflects the type of reasoning that is needed to answer the question. This data set consists of close to 40 K question-answer pairs and its stratified design allows machine reading researchers to quickly focus in on areas that present the most challenge for a machine comprehension system. We further establish a baseline for future research with BERT, and present results that show the levels we have designed correspond well with the level of difficulty that BERT experiences in answering these questions, as reflected by the lower accuracy for higher levels. We have also collected human answers to the questions in the test portion of this data set, and show that humans and the machine have different challenges when answering these questions. This means that even though the machine is approaching human-level performance on this task, humans and the machine perform this task with very different mechanisms.

We would like to thank the students from Ludong University, particularly Liang Jian ( ), Xu Yuanyuan ( ), Shang Guofeng ( ), and students from Nanjing Normal University, particularly Liu Han ( ), Cao Ziyan ( ), Mao Xuefen ( ) for their assistance with data preparation. The second author would like to acknowledge the support from a National Language Committee project (YB135-23) and a Jiangsu Higher Institutions’ Excellent Innovative Team for Philosophy and Social Sciences project (2017STD006). The third author would like to acknowledge the support of a National Language Committee “13th Five-Year” Research Plan project (ZD\(\vert \)135-22).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    See the leadboard at https://rajpurkar.github.io/SQuAD-explorer/. On SQuAD 1.0, a number of systems have surpassed human performance, and on SQuAD 2.0, the state of the art systems is approaching human performance.

  2. 2.

    Data will be made available here: https://www.cs.brandeis.edu/~clp/smart.

  3. 3.

    https://github.com/attardi/wikiextractor.

References

  1. Chen, C., Ng, V.: Chinese zero pronoun resolution: some recent advances. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (2013)

    Google Scholar 

  2. Clark, P., et al.: Think you have solved question answering? try arc, the AI2 reasoning challenge. CoRR abs/1803.05457 (2018). http://arxiv.org/abs/1803.05457

  3. Cui, Y., Liu, T., Chen, Z., Wang, S., Hu, G.: Consensus attention-based neural networks for chinese reading comprehension. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (2016)

    Google Scholar 

  4. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  5. Dunn, M., Sagun, L., Higgins, M., Güney, V.U., Cirik, V., Cho, K.: SearchQA: a new Q&A dataset augmented with context from a search engine. CoRR abs/1704.05179 (2017). http://arxiv.org/abs/1704.05179

  6. He, W., et al.: DuReader: a Chinese machine reading comprehension dataset from real-world applications. In: Proceedings of the Workshop on Machine Reading for Question Answering, pp. 37–46 (2018)

    Google Scholar 

  7. Joshi, M., Choi, E., Weld, D.S., Zettlemoyer, L.: TriviaQA: a large scale distantly supervised challenge dataset for reading comprehension. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Vancouver, Canada, July 2017

    Google Scholar 

  8. Khashabi, D., Chaturvedi, S., Roth, M., Upadhyay, S., Roth, D.: Looking beyond the surface: a challenge set for reading comprehension over multiple sentences. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long Papers), pp. 252–262 (2018)

    Google Scholar 

  9. Kocisky, T., et al.: The narrativeqa reading comprehension challenge. Trans. Assoc. Comput. Linguis. 6, 317–328 (2018)

    Article  Google Scholar 

  10. Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.: RACE: large-scale reading comprehension dataset from examinations. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (2017)

    Google Scholar 

  11. Lee, K., He, L., Lewis, M., Zettlemoyer, L.: End-to-end neural coreference resolution. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark (2017)

    Google Scholar 

  12. Ng, V., Cardie, C.: Improving machine learning approaches to coreference resolution. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (2002)

    Google Scholar 

  13. Raghunathan, K., et al.: A multi-pass sieve for coreference resolution. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (2010)

    Google Scholar 

  14. Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (2016)

    Google Scholar 

  15. Richardson, M., Burges, C.J., Renshaw, E.: MCTest: a challenge dataset for the open-domain machine comprehension of text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (2013)

    Google Scholar 

  16. Shao, C., Liu, T., Lai, Y., Tseng, Y., Tsai, S.: DRCD: a Chinese machine reading comprehension dataset. CoRR abs/1806.00920 (2018). http://arxiv.org/abs/1806.00920

  17. Soon, W.M., Ng, H.T., Lim, D.C.Y.: A machine learning approach to coreference resolution of noun phrases. Comput. Linguist. 27(4), 521–544 (2001)

    Article  Google Scholar 

  18. Trischler, A., et al.: NewsQA: a machine comprehension dataset. In: Proceedings of the 2nd Workshop on Representation Learning for NLP (2017)

    Google Scholar 

  19. Welbl, J., Stenetorp, P., Riedel, S.: Constructing datasets for multi-hop reading comprehension across documents. Trans. Assoc. Comput. Linguist. 6, 287–302 (2018)

    Article  Google Scholar 

  20. Xue, N., Ng, H.T., Pradhan, S., Prasad, R., Bryant, C., Rutherford, A.: The CoNLL-2015 shared task on shallow discourse parsing. In: Proceedings of the Nineteenth Conference on Computational Natural Language Learning-Shared Task, pp. 1–16 (2015)

    Google Scholar 

  21. Xue, N., et al.: CoNLL 2016 shared task on multilingual shallow discourse parsing. In: Proceedings of the CoNLL-16 shared task (2016)

    Google Scholar 

  22. Zhao, S., Ng, H.T.: Identification and resolution of Chinese zero pronouns: a machine learning approach. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nianwen Xue .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yao, J., Feng, M., Feng, H., Wang, Z., Zhang, Y., Xue, N. (2019). SMART: A Stratified Machine Reading Test. In: Tang, J., Kan, MY., Zhao, D., Li, S., Zan, H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2019. Lecture Notes in Computer Science(), vol 11838. Springer, Cham. https://doi.org/10.1007/978-3-030-32233-5_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32233-5_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32232-8

  • Online ISBN: 978-3-030-32233-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics