Skip to main content

Temporal Natural Language Inference: Evidence-Based Evaluation of Temporal Text Validity

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13980))

Included in the following conference series:

Abstract

It is important to learn whether text information remains valid or not for various applications including story comprehension, information retrieval, and user state tracking on microblogs and via chatbot conversations. It is also beneficial to deeply understand the story. However, this kind of inference is still difficult for computers as it requires temporal commonsense. We propose a novel task, Temporal Natural Language Inference, inspired by traditional natural language reasoning to determine the temporal validity of text content. The task requires inference and judgment whether an action expressed in a sentence is still ongoing or rather completed, hence, whether the sentence still remains valid, given its supplementary content. We first construct our own dataset for this task and train several machine learning models. Then we propose an effective method for learning information from an external knowledge base that gives hints on temporal commonsense knowledge. Using prepared dataset, we introduce a new machine learning model that incorporates the information from the knowledge base and demonstrate that our model outperforms state-of-the-art approaches in the proposed task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Note that it is not always easy to determine the correct answer as the context or necessary details might be missing, and in such cases humans seem to rely on probabilistic reasoning besides the commonsense base.

  2. 2.

    The dataset will be made freely available after paper publication.

  3. 3.

    Note that \(s_1\) and \(s_2\) may have temporal order: \(t_{s_1} \le t_{s_2}\), where \(t_{s_{id}}\) \((id=1,2)\) is the creation time (or a reading order) of a sentence \(s_{id}\). This may be for example in the case of receiving microblog posts issued by the user (or when reading next sentences of a story or a novel).

  4. 4.

    SNLI dataset is licensed under CC-BY-SA 4.0.

  5. 5.

    https://www.mturk.com/.

References

  1. Abe, S., Shirakawa, M., Nakamura, T., Hara, T., Ikeda, K., Hoashi, K.: Predicting the Occurrence of Life Events from User’s Tweet History. In Proceedings of the 12th IEEE International Conference on Semantic Computing (ICSC 2018), pp. 219–226 (2018)

    Google Scholar 

  2. Abel, F., Gao, Q., Houben, G.-J., Tao, K.: Analyzing user modeling on twitter for personalized news recommendations. In Proceedings of the 19th International Conference on User Modeling, Adaptation, and Personalization (UMAP 2011), pp. 1–12 (2011)

    Google Scholar 

  3. Almquist, V., Jatowt, A.: Towards content expiry date determination: predicting validity periods of sentences. In Proceedings of the 41st European Conference on IR Research (ECIR 2019), pp. 86–101 (2019)

    Google Scholar 

  4. Bordes, A., Usunier, N., Garcia-Durán, A., Weston, J., Yakhnenko, O.: Translating Embeddings for Modeling Multi-relational Data. In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS 2013), pp. 2787–2795 (2013)

    Google Scholar 

  5. Bosselut, A., Rashkin, H., Sap, M., Malaviya, C., Celikyilmaz, A., Choi. Y.: COMET: commonsense transformers for automatic knowledge graph construction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4762–4779, Florence, Italy, July (2019). Association for Computational Linguistics

    Google Scholar 

  6. Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 632–642, Lisbon, Portugal (2015). Association for Computational Linguistics

    Google Scholar 

  7. Chen, Q., Zhu, X., Ling, Z.-H., Wei, Z.-H., Jiang, H., Inkpen, D.: Enhanced LSTM for natural language inference. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1657–1668, Vancouver, Canada (2017). Association for Computational Linguistics

    Google Scholar 

  8. Chen, Y., Huang, S., Wang, F., Cao, J., Sun, W., Wan, X.: Neural maximum subgraph parsing for cross-domain semantic dependency analysis. In Proceedings of the 22nd Conference on Computational Natural Language Learning, pp. 562–572, Brussels, Belgium (2018). Association for Computational Linguistics

    Google Scholar 

  9. Cheng, F., Miyao, Y.: Predicting event time by classifying sub-level temporal relations induced from a unified representation of time anchors. arXiv preprint arXiv:2008.06452 (2020)

  10. Chicco, D.: Siamese Neural Networks: an Overview. Artificial Neural Networks - Third Edition, pp. 73–94 (2021)

    Google Scholar 

  11. Clark, P., Dalvi, B., Tandon, N.: What happened? leveraging VerbNet to predict the effects of actions in procedural text. arXiv preprint arXiv:1804.05435 (2018)

  12. Condoravdi, C., Crouch, D., de Paiva, V., Stolle, R., Bobrow, D.G.: Entailment, intensionality and text understanding. In Proceedings of the HLT-NAACL 2003 Workshop on Text Meaning, pp. 38–45 (2003)

    Google Scholar 

  13. Conneau, A., Kiela, D., Schwenk, H., Barrault, L., Bordes, A.: Supervised learning of universal sentence representations from natural language inference data. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 670–680, Copenhagen, Denmark (2017). Association for Computational Linguistics

    Google Scholar 

  14. Crawshaw, M.: Multi-task learning with deep neural networks: a survey. arXiv preprint arXiv:2009.09796 (2020)

  15. Dagan, I., Glickman, O., Magnini, B.: The PASCAL recognising textual entailment challenge. In: Machine Learning Challenges Workshop (MLCW 2005), pp. 177–190 (2005)

    Google Scholar 

  16. Demszky, D., Guu, K., Liang, P.: Transforming question answering datasets into natural language inference datasets. arXiv preprint arXiv:1809.02922 (2018)

  17. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186, Minneapolis, Minnesota (2019). Association for Computational Linguistics

    Google Scholar 

  18. Dligach, D., Miller, T., Lin, C., Bethard, S., Savova, G.: Neural temporal relation extraction. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp. 746–751, Valencia, Spain (2017). Association for Computational Linguistics

    Google Scholar 

  19. Fillmore, C.J., Baker, C.: A frames approach to semantic analysis. In: The Oxford Handbook of Linguistic Analysis. Oxford University Press (2010)

    Google Scholar 

  20. Fyodorov, Y., Winter, Y., Francez, N.: A natural logic inference system. In: Proceedings of the 2nd Workshop on Inference in Computational Semantics (ICoS-2) (2000)

    Google Scholar 

  21. Gao, Q., Yang, S., Chai, J., Vanderwende, L.: What action causes this? towards naive physical action-effect prediction. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 934–945, Melbourne, Australia (2018). Association for Computational Linguistics

    Google Scholar 

  22. Glockner, M., Shwartz, V., Goldberg, Y.: Breaking NLI systems with sentences that require simple lexical inferences. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 650–655, Melbourne, Australia (2018). Association for Computational Linguistics

    Google Scholar 

  23. Han, R., Liang, M., Alhafni, B., Peng, N.: Contextualized word embeddings enhanced event temporal relation extraction for story understanding. arXiv preprint arXiv:1904.11942 (2019)

  24. Harabagiu, S., Bejan, C.A.: Question answering based on temporal inference. In: Proceedings of the AAAI-2005 Workshop on Inference for Textual Question Answering, pp. 27–34 (2005)

    Google Scholar 

  25. Hwang, J.D., et al.: (COMET-) Atomic 2020: on symbolic and neural commonsense knowledge graphs. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI-21), pp. 6384–6392 (2021)

    Google Scholar 

  26. Jatowt, A.: Temporal question answering in news article collections. In: Companion of The Web Conference 2022, Virtual Event/Lyon, France, 25–29 April 2022, p. 895. ACM (2022)

    Google Scholar 

  27. Jatowt, A., Antoine, É., Kawai, Y., Akiyama, T.: Mapping temporal horizons: analysis of collective future and past related attention in Twitter. In: Proceedings Of The 24th International Conference on World Wide Web (WWW 2015), pp. 484–494 (2015)

    Google Scholar 

  28. Kanazawa, K., Jatowt, A., Tanaka, K.: Improving retrieval of future-related information in text collections. In: Proceedings of the 2011 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2011), pp. 278–283 (2011)

    Google Scholar 

  29. Kapanipathi, P., et al.: Infusing knowledge into the textual entailment task using graph convolutional networks. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI-20), pp. 8074–8081 (2020)

    Google Scholar 

  30. Khot, T., Sabharwal, A., Clark, P.: SciTaiL: a textual entailment dataset from science question answering. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI-18) (2018)

    Google Scholar 

  31. Koupaee, M., Wang, W.Y.: WikiHow: a large scale text summarization dataset. arXiv preprint arXiv:1810.09305 (2018)

  32. Levesque, H., Davis, E., Morgenstern, L.: The winograd schema challenge. In: Proceedings of the 13th International Conference on the Principles of Knowledge Representation and Reasoning (KR 2012), pp. 552–561 (2012)

    Google Scholar 

  33. Li, P., Lu, H., Kanhabua, N., Zhao, S., Pan, G.: Location inference for non-geotagged tweets in user timelines. IEEE Trans. Knowl. Data Eng. (TKDE) 31(6), 1150–1165 (2018)

    Article  Google Scholar 

  34. Lin, B.Y., Chen, X., Chen, J., Ren, X.: KagNet: knowledge-aware graph networks for commonsense reasoning. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 2829–2839, Hong Kong, China (2019). Association for Computational Linguistics

    Google Scholar 

  35. Liu, H., Singh, P.: ConceptNet - a practical commonsense reasoning tool-kit. BT Technol. J. 22(4), 211–226 (2004)

    Article  Google Scholar 

  36. Liu, Q., Jiang, H., Ling, Z.-H., Zhu, X., Wei, S., Hu, Y.: Combing context and commonsense knowledge through neural networks for solving winograd schema problems. In: Proceedings of the AAAI 2017 Spring Symposium on Computational Context: Why It’s Important, What It Means, and Can It Be Computed? pp. 315–321 (2017)

    Google Scholar 

  37. Liu, Y., et al.: RoBERTa: a robustly optimized BERt pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

  38. Luo, Z., Sha, Y., Zhu, K.Q., Hwang, S.-W., Wang, Z.: Commonsense causal reasoning between short texts. In: Proceedings of the 15th International Conference on the Principles of Knowledge Representation and Reasoning (KR 2016), pp. 421–431 (2016)

    Google Scholar 

  39. Miech, A., Zhukov, D., Alayrac, J.-B., Tapaswi, M., Laptev, I., Sivic, J.: HowTo100M: learning a text-video embedding by watching hundred million narrated video clips. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2019), pp. 2630–2640 (2019)

    Google Scholar 

  40. Mihaylov, T., Frank, A.: Knowledgeable reader: enhancing cloze-style reading comprehension with external commonsense knowledge. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 821–832, Melbourne, Australia (2018). Association for Computational Linguistics

    Google Scholar 

  41. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 27th Annual Conference on Neural Information Processing Systems (NIPS 2013), pp. 3111–3119 (2013)

    Google Scholar 

  42. Minard, A.-L., et al.: SemEval-2015 task 4: timeline: cross-document event ordering. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp. 778–786, Denver, Colorado (2015). Association for Computational Linguistics

    Google Scholar 

  43. Mnasri, M.: Recent advances in conversational NLP: towards the standardization of chatbot building. arXiv preprint arXiv:1903.09025 (2019)

  44. Mostafazadeh, N., et al.: A corpus and cloze evaluation for deeper understanding of commonsense stories. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 839–849, San Diego, California (2016). Association for Computational Linguistics

    Google Scholar 

  45. Ning, Q., Wu, H., Roth, D.: A multi-axis annotation scheme for event temporal relations. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1318–1328, Melbourne, Australia (2018). Association for Computational Linguistics

    Google Scholar 

  46. Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), pp. 8026–8037 (2019)

    Google Scholar 

  47. Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543, Doha, Qatar (2014). Association for Computational Linguistics

    Google Scholar 

  48. Peters, M.E., et al.: Knowledge enhanced contextual word representations. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 43–54, Hong Kong, China (2019). Association for Computational Linguistics

    Google Scholar 

  49. Rashkin, H., Sap, M., Allaway, E., Smith, N.A., Choi, Y.: Event2Mind: commonsense inference on events, intents, and reactions. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 463–473, Melbourne, Australia (2018). Association for Computational Linguistics

    Google Scholar 

  50. Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3982–3992, Hong Kong, China (2019). Association for Computational Linguistics

    Google Scholar 

  51. Sap, M., et al.: ATOMIC: an atlas of machine commonsense for if-then reasoning. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-19), pp. 3027–3035 (2019)

    Google Scholar 

  52. Schuler, K.: VerbNet: a broad-coverage, comprehensive verb lexicon, Ph. D. thesis, University of Pennsylvania (2005)

    Google Scholar 

  53. Speer, R., Chin, J., Havasi, C.: ConceptNet 5.5: an open multilingual graph of general knowledge. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI-17), pp. 4444–4451 (2017)

    Google Scholar 

  54. Storks, S., Gao, Q., Chai, J.Y.: Commonsense reasoning for natural language understanding: a survey of benchmarks, resources, and approaches. arXiv preprint arXiv:1904.01172, pp. 1–60 (2019)

  55. Storks, S., Gao, Q., Chai, J.Y.: Recent advances in natural language inference: a survey of benchmarks, resources, and approaches. arXiv preprint arXiv:1904.01172 (2019)

  56. Sun, Z., et al.: Self-explaining structures improve NLP models. arXiv preprint arXiv:2012.01786 (2020)

  57. Takemura, H., Tajima, K.: Tweet classification based on their lifetime duration. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM 2012), pp. 2367–2370 (2012)

    Google Scholar 

  58. Tamborrino, A., Pellicanò, N., Pannier, B., Voitot, P., Naudin, L.: Pre-training is (almost) all you need: an application to commonsense reasoning. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 3878–3887, Online (2020). Association for Computational Linguistics

    Google Scholar 

  59. Torfi, A., Shirvani, R.A., Keneshloo, Y., Tavaf, N., Fox, E.A.: Natural language processing advancements by deep learning: a survey. arXiv preprint arXiv:2003.01200 (2020)

  60. Trinh, T.H., Le, Q.V.: A simple method for commonsense reasoning. arXiv preprint arXiv:1806.02847 (2018)

  61. Trouillon, T., Welbl, J., Riedel, S., Gaussier, É., Bouchard, G.: Complex Embeddings for Simple Link Prediction. In Proceedings of the 33nd International Conference on Machine Learning (ICML 2016), pp. 2071–2080 (2016)

    Google Scholar 

  62. Vashishtha, S., Poliak, A., Lal, Y.K., Van Durme, B., White, A.S.: Temporal reasoning in natural language inference. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 4070–4078, Online (2020). Association for Computational Linguistics

    Google Scholar 

  63. Vashishtha, S., Van Durme, B., White, A.S.: Fine-grained temporal relation extraction. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2906–2919, Florence, Italy (2019). Association for Computational Linguistics

    Google Scholar 

  64. Vrandečić, D., Krötzsch, M.: Wikidata: A Free Collaborative Knowledgebase. Commun. ACM 57(10), 78–85 (2014)

    Google Scholar 

  65. Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.: GLUE: a multi-task benchmark and analysis platform for natural language understanding. In: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pp. 353–355, Brussels, Belgium (2018). Association for Computational Linguistics

    Google Scholar 

  66. Wang, X., et al.: Improving natural language inference using external knowledge in the science questions domain. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI-19), pp. 7208–7215 (2019)

    Google Scholar 

  67. Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI-14), pp. 1112–1119 (2014)

    Google Scholar 

  68. White, A.S., Rastogi, P., Duh, K., Van Durme, B.: Inference is everything: Recasting semantic resources into a unified evaluation framework. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 996–1005, Taipei, Taiwan (2017). Asian Federation of Natural Language Processing

    Google Scholar 

  69. White, R.W., Awadallah, A.H.: Task duration estimation. In: Proceedings of the 12th ACM International Conference on Web Search and Data Mining (WSDM 2019), pp. 636–644 (2019)

    Google Scholar 

  70. Williams, A., Nangia, N., Bowman, S.: A broad-coverage challenge corpus for sentence understanding through inference. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 1112–1122, New Orleans, Louisiana (2018). Association for Computational Linguistics

    Google Scholar 

  71. Wolf, T., et al.: HuggingFace’s transformers: state-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019)

  72. Xiang, W., Wang, B.: A survey of event extraction from text. IEEE Access 7, 173111–173137 (2019)

    Article  Google Scholar 

  73. Yasunaga, M., Ren, H., Bosselut, A., Liang, P., Leskovec, J.: QA-GNN: reasoning with language models and knowledge graphs for question answering. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 535–546, Online (2021). Association for Computational Linguistics

    Google Scholar 

  74. Young, P., Lai, A., Hodosh, M., Hockenmaier, J.: From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions. Trans. Assoc. Comput. Linguist. 2, 67–78 (2014)

    Article  Google Scholar 

  75. Zhang, T., et al.: HORNET: enriching pre-trained language representations with heterogeneous knowledge sources. In: Proceedings of the 30th ACM International Conference on Information and Knowledge Management (CIKM 2021), pp. 2608–2617 (2021)

    Google Scholar 

  76. Zhang, Z., Han, X., Liu, Z., Jiang, X., Sun, M., Liu, Q.: ERNIE: enhanced language representation with informative entities. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1441–1451, Florence, Italy (2019). Association for Computational Linguistics

    Google Scholar 

  77. Zhou, B., Khashabi, D., Ning, Q., Roth, D.: “Going on a vacation” takes longer than “going for a walk”: a study of temporal commonsense understanding. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3363–3369, Hong Kong, China (2019). Association for Computational Linguistics

    Google Scholar 

  78. Zhou, B., Ning, Q., Khashabi, D., Roth, D. Temporal common sense acquisition with minimal supervision. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7579–7589, Online (2020). Association for Computational Linguistics

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adam Jatowt .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hosokawa, T., Jatowt, A., Sugiyama, K. (2023). Temporal Natural Language Inference: Evidence-Based Evaluation of Temporal Text Validity. In: Kamps, J., et al. Advances in Information Retrieval. ECIR 2023. Lecture Notes in Computer Science, vol 13980. Springer, Cham. https://doi.org/10.1007/978-3-031-28244-7_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-28244-7_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-28243-0

  • Online ISBN: 978-3-031-28244-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics