Abstract
Natural language inference (NLI) aims to identify the logical relationship between a premise and a corresponding hypothesis, which requires the model should have the ability of effectively capturing their semantic relationship. Most of the existing transformer-based models tend to concatenate the premise and hypothesis together as the input of the model and capture their relationship through multi-head self-attention mechanism, which as a result might only consider their plain context-sensitive relationship and neglect the potentially mutual impacts of their contextual semantics. To better model the relationship between the premise and hypothesis, we propose a new transformer-based model RAN4NLI that consists of a sequence encoder based on pre-trained language model for encoding the input semantics and an interaction network based on residual attention for further capturing their relationship. We utilize residual attention for combining multi-head self-attention and cross-attention information so as to strengthen the potential semantic relationship between the premise and hypothesis. Experiments conducted on two canonical datasets, SNLI and SciTail, demonstrate that our RAN4NLI achieves comparable performance with other strong baseline models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ghaeini, R., Hasan, S.A., Datla, V., et al.: DR-BiLSTM: dependent reading bidirectional LSTM for natural language inference. In: Proceedings of NAACL-HLT, pp. 1460–1469. Association for Computational Linguistics, New Orleans (2018)
Mou, L., Men, R., Li, G., et al.: Natural language inference by tree-based convolution and heuristic matching. In: The 54th Annual Meeting of the Association for Computational Linguistics, pp. 130–136. Association for Computational Linguistics, Berlin (2016)
Chen, Q., Zhu, X., Ling, Z.-H., et al.: Enhanced LSTM for natural language inference. In: The 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017), pp. 1657–1668. Association for Computational Linguistics, Berlin (2017)
Aarne, Y., Anssi, J., Jorg, T.: Sentence embeddings in NLI with iterative refinement encoders. Nat. Lang. Eng. 25(4), 467–482 (2019)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT, pp. 4171–4186. Association for Computational Linguistics, Minneapolis (2019)
Liu, Y.H., Ott, M., Goyal, N., Du, J.F.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Bowman, S.R., Gauthier, J., Rastogi, A., et al.: A large annotated corpus for learning natural language inference. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processings, pp. 632–642. Association for Computational Linguistics, Portugal (2015)
Khot, T., Sabharwal, A., Clark, P.: SciTaiL: a textual entailment dataset from science question answering. In: 32nd AAAI Conference on Artificial Intelligence (AAAI 2018), pp. 5189–5197. AAAI Press, CA (2018)
Bowman, S.R., Gauthier, J., Rastogi, A., et al.: A fast unified model for parsing and sentence understanding. In: The 54th Annual Meeting of the Association for Computational Linguistics, pp. 1466–1477. Association for Computational Linguistics, Berlin (2016)
Conneau, A., Kiela, D., Schwenk, H., et al.: Supervised learning of universal sentence representations from natural language inference data. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 670–680. Association for Computational Linguistics, Copenhagen (2017)
Tay, Y., Tuan, L.A., Hui, S.C.: Compare, compress and propagate: enhancing neural architectures with alignment factorization for natural language inference. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 1565–1575. Association for Computational Linguistics, Stroudsburg (2018)
Liu, X., He, P., Chen, W., Gao, J.: Multi-task deep neural networks for natural language understanding. In: The 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019), pp. 4487–4496. Association for Computational Linguistics, Berlin (2019)
Zhang, Z., Wu, Y., Zhao, H., et al.: Semantics-aware BERT for language understanding. In: Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2020), vol. 34, no. (5), pp. 9628–9635. AAAI Press, CA (2020)
Gajbhiye, A., Moubayed, N.A., Bradley, S.: ExBERT: an external knowledge enhanced BERT for natural language inference. In: 30th International Conference on Artificial Neural Networks, pp. 460–472. European Neural Network Society, Switzerland (2021)
Pilault, J., Elhattami, A., Pal, C.: Conditionally adaptive multi-task learning: improving transfer learning in NLP using fewer parameters & less data. In: International Conference on Learning Representations (ICLR 2021). OpenReview.net, Vienna (2021)
Vaswani, A., Noam, N., Niki, P., et al.: Attention is all you need. In: 31st Conference on Neural Information Processing Systems (NIPS 2017), vol. 30, pp. 5998–6008. Currant Associates, CA (2017)
Liu, X.D., Cheng, H., He, P.C., et al.: Adversarial training for large neural language models. arXiv preprint arXiv.2004.08994 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Yu, S., Su, J., Ye, X., Ma, D. (2024). Improving Natural Language Inference with Residual Attention. In: Huang, DS., Premaratne, P., Yuan, C. (eds) Applied Intelligence. ICAI 2023. Communications in Computer and Information Science, vol 2015. Springer, Singapore. https://doi.org/10.1007/978-981-97-0827-7_29
Download citation
DOI: https://doi.org/10.1007/978-981-97-0827-7_29
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0826-0
Online ISBN: 978-981-97-0827-7
eBook Packages: Computer ScienceComputer Science (R0)