Testing the Reasoning Power for NLI Models with Annotated Multi-perspective Entailment Dataset

Yu, Dong; Liu, Lu; Yu, Chen; Li, Changliang

doi:10.1007/978-3-030-32381-3_2

Dong Yu¹³,
Lu Liu¹³,
Chen Yu¹³ &
…
Changliang Li¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11856))

Included in the following conference series:

China National Conference on Chinese Computational Linguistics

4139 Accesses

Abstract

Natural language inference (NLI) is a challenging task to determine the relationship between a pair of sentences. Existing Neural Network-based (NN-based) models have achieved prominent success. However, rare models are interpretable. In this paper, we propose a Multi-perspective Entailment Category Labeling System (METALs). It consists of three categories, ten sub-categories. We manually annotate 3,368 entailment items. The annotated data is used to explain the recognition ability of four NN-based models at a fine-grained level. The experimental results show that all the models have poor performance in the commonsense reasoning than in other entailment categories. The highest accuracy difference is 13.22%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Akhmatova, E., Dras, M.: Using hypernymy acquisition to tackle (part of) textual entailment. In: Proceedings of the 2009 Workshop on Applied Textual Inference, pp. 52–60. Association for Computational Linguistics (2009)
Google Scholar
Bentivogli, L., Cabrio, E., Dagan, I., Giampiccolo, D., Leggio, M.L., Magnini, B.: Building textual entailment specialized data sets: a methodology for isolating linguistic phenomena relevant to inference. In: LREC. Citeseer (2010)
Google Scholar
Bentivogli, L., Clark, P., Dagan, I., Giampiccolo, D.: The fifth PASCAL recognizing textual entailment challenge. In: TAC (2009)
Google Scholar
Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 632–642. Association for Computational Linguistics, Lisbon (2015). https://doi.org/10.18653/v1/D15-1075
Carmona, V.I.S., Mitchell, J., Riedel, S.: Behavior analysis of NLI models: uncovering the influence of three factors on robustness (2018)
Google Scholar
Chen, Q., Zhu, X., Ling, Z.H., Wei, S., Jiang, H., Inkpen, D.: Recurrent neural network-based sentence encoder with gated attention for natural language inference. arXiv preprint arXiv:1708.01353 (2017)
Clark, P., Murray, W.R., Thompson, J., Harrison, P., Hobbs, J., Fellbaum, C.: On the role of lexical and world knowledge in RTE3. In: Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, pp. 54–59. Association for Computational Linguistics (2007)
Google Scholar
Conneau, A., Kiela, D., Schwenk, H., Barrault, L., Bordes, A.: Supervised learning of universal sentence representations from natural language inference data (2017)
Google Scholar
Dagan, I., Glickman, O., Magnini, B.: The PASCAL recognising textual entailment challenge. In: Quiñonero-Candela, J., Dagan, I., Magnini, B., d’Alché-Buc, F. (eds.) MLCW 2005. LNCS (LNAI), vol. 3944, pp. 177–190. Springer, Heidelberg (2006). https://doi.org/10.1007/11736790_9
Chapter Google Scholar
Demszky, D., Guu, K., Liang, P.: Transforming question answering datasets into natural language inference datasets (2018)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Garoufi, K.: Towards a better understanding of applied textual entailment. Ph.D. thesis, Citeseer (2007)
Google Scholar
Ghaeini, R., et al.: DR-BiLSTM: dependent reading bidirectional LSTM for natural language inference. arXiv preprint arXiv:1802.05577 (2018)
Glockner, M., Shwartz, V., Goldberg, Y.: Breaking NLI systems with sentences that require simple lexical inferences. arXiv preprint arXiv:1805.02266 (2018)
Gururangan, S., Swayamdipta, S., Levy, O., Schwartz, R., Bowman, S.R., Smith, N.A.: Annotation artifacts in natural language inference data (2018)
Google Scholar
Liu, X., He, P., Chen, W., Gao, J.: Multi-task deep neural networks for natural language understanding. arXiv preprint arXiv:1901.11504 (2019)
Liu, Y., Sun, C., Lin, L., Wang, X.: Learning natural language inference using bidirectional LSTM model and inner-attention. arXiv preprint arXiv:1605.09090 (2016)
Marelli, M., Bentivogli, L., Baroni, M., Bernardi, R., Menini, S., Zamparelli, R.: Semeval-2014 task 1: evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 1–8 (2014)
Google Scholar
McCann, B., Keskar, N.S., Xiong, C., Socher, R.: The natural language decathlon: multitask learning as question answering. arXiv preprint arXiv:1806.08730 (2018)
Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Mou, L., et al.: Natural language inference by tree-based convolution and heuristic matching. In: Meeting of the Association for Computational Linguistics (2016)
Google Scholar
Naik, A., Ravichander, A., Sadeh, N., Rose, C., Neubig, G.: Stress test evaluation for natural language inference (2018)
Google Scholar
Parikh, A.P., Täckström, O., Das, D., Uszkoreit, J.: A decomposable attention model for natural language inference. arXiv preprint arXiv:1606.01933 (2016)
Qian, C., Zhu, X., Ling, Z.H., Si, W., Inkpen, D.: Enhanced LSTM for natural language inference (2017)
Google Scholar
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding with unsupervised learning, Technical report, OpenAI (2018)
Google Scholar
Roberts, K.: Building an annotated textual inference corpus for motion and space. In: Proceedings of the 2009 Workshop on Applied Textual Inference, pp. 48–51. Association for Computational Linguistics (2009)
Google Scholar
Sammons, M., Vydiswaran, V., Roth, D.: Ask not what textual entailment can do for you... In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. pp. 1199–1208. Association for Computational Linguistics (2010)
Google Scholar
Tay, Y., Tuan, L.A., Hui, S.C.: Compare, compress and propagate: enhancing neural architectures with alignment factorization for natural language inference. arXiv preprint arXiv:1801.00102 (2017)
Vanderwende, L., Dolan, W.B.: What syntax can contribute in the entailment task. In: Quiñonero-Candela, J., Dagan, I., Magnini, B., d’Alché-Buc, F. (eds.) MLCW 2005. LNCS (LNAI), vol. 3944, pp. 205–216. Springer, Heidelberg (2006). https://doi.org/10.1007/11736790_11
Chapter Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Wang, Z., Hamza, W., Florian, R.: Bilateral multi-perspective matching for natural language sentences (2017)
Google Scholar
Welleck, S., Weston, J., Szlam, A., Cho, K.: Dialogue natural language inference (2018)
Google Scholar
Williams, A., Nangia, N., Bowman, S.R.: A broad-coverage challenge corpus for sentence understanding through inference (2017)
Google Scholar
Zhang, Z., et al.: I know what you want: semantic learning for text comprehension. arXiv preprint arXiv:1809.02794 (2018)

Download references

Acknowledgments

This work is funded by National Key R&D Program of China, “Cloud computing and big data” key projects (2018YFB1005105).

Author information

Authors and Affiliations

Beijing Language and Culture University, Beijing, China
Dong Yu, Lu Liu & Chen Yu
Kingsoft AI Lab, Beijing, China
Changliang Li

Authors

Dong Yu
View author publications
You can also search for this author in PubMed Google Scholar
Lu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chen Yu
View author publications
You can also search for this author in PubMed Google Scholar
Changliang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dong Yu .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Maosong Sun
Fudan University, Shanghai, China
Xuanjing Huang
University of Illinois at Urbana Champaign, Illinois, USA
Heng Ji
Tsinghua University, Beijing, China
Zhiyuan Liu
Tsinghua University, Beijing, China
Yang Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, D., Liu, L., Yu, C., Li, C. (2019). Testing the Reasoning Power for NLI Models with Annotated Multi-perspective Entailment Dataset. In: Sun, M., Huang, X., Ji, H., Liu, Z., Liu, Y. (eds) Chinese Computational Linguistics. CCL 2019. Lecture Notes in Computer Science(), vol 11856. Springer, Cham. https://doi.org/10.1007/978-3-030-32381-3_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-32381-3_2
Published: 13 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32380-6
Online ISBN: 978-3-030-32381-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics