skip to main content
10.1145/3459637.3482207acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

Uncertainty-Aware Self-Training for Semi-Supervised Event Temporal Relation Extraction

Authors Info & Claims
Published:30 October 2021Publication History

ABSTRACT

Extracting event temporal relations is an important task for natural language understanding. Many works have been proposed for supervised event temporal relation extraction, which typically requires a large amount of human-annotated data for model training. However, the data annotation for this task is very time-consuming and challenging. To this end, we study the problem of semi-supervised event temporal relation extraction. Self-training as a widely used semi-supervised learning method can be utilized for this problem. However, it suffers from the noisy pseudo-labeling problem. In this paper, we propose the use of uncertainty-aware self-training framework (UAST) to quantify the model uncertainty for coping with pseudo-labeling errors. Specifically, UAST utilizes (1) Uncertainty Estimation module to compute the model uncertainty for pseudo-labeling unlabeled data; (2) Sample Selection with Exploration module to select informative samples based on uncertainty estimates; and (3) Uncertainty-Aware Learning module to explicitly incorporate the model uncertainty into the self-training process. Experimental results indicate that our approach significantly outperforms previous state-of-the-art methods.

Skip Supplemental Material Section

Supplemental Material

CIKM2021.mp4

mp4

91.8 MB

References

  1. Taylor Cassidy, Bill McDowell, Nathanael Chambers, and Steven Bethard. 2014. An Annotation Framework for Dense Event Ordering. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 501--506.Google ScholarGoogle ScholarCross RefCross Ref
  2. Nathanael Chambers, Taylor Cassidy, Bill McDowell, and Steven Bethard. 2014. Dense Event Ordering with a Multi-Pass Architecture. Transactions of the Association for Computational Linguistics (2014), 273--284.Google ScholarGoogle Scholar
  3. Snigdha Chaturvedi, Haoruo Peng, and Dan Roth. 2017. Story Comprehension for Predicting What Happens Next. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 1603--1614.Google ScholarGoogle ScholarCross RefCross Ref
  4. Fei Cheng and Yusuke Miyao. 2017. Classifying Temporal Relations by Bidirectional LS™ over Dependency Paths. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  5. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of NAACL-HLT. 4171--4186.Google ScholarGoogle Scholar
  6. Quang Do, Wei Lu, and Dan Roth. 2012. Joint Inference for Event Timeline Construction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 677--687. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016,, Maria-Florina Balcan and Kilian Q. Weinberger (Eds.). 1050--1059. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Yarin Gal, Riashat Islam, and Zoubin Ghahramani. 2017. Deep Bayesian Active Learning with Image Data. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Doina Precup and Yee Whye Teh (Eds.). 1183--1192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Rujun Han, I-Hung Hsu, Mu Yang, Aram Galstyan, Ralph Weischedel, and Nanyun Peng. 2019 a. Deep Structured Neural Network for Event Temporal Relation Extraction. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL). 666--106.Google ScholarGoogle ScholarCross RefCross Ref
  10. Rujun Han, Qiang Ning, and Nanyun Peng. 2019 b. Joint Event and Temporal Relation Extraction with Shared Representations and Structured Prediction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 434--444.Google ScholarGoogle ScholarCross RefCross Ref
  11. Rujun Han, Yichao Zhou, and Nanyun Peng. 2020. Domain Knowledge Empowered Structured Neural Net for End-to-End Event Temporal Relation Extraction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 5717--5729.Google ScholarGoogle ScholarCross RefCross Ref
  12. Junxian He, Jiatao Gu, Jiajun Shen, and Marc'Aurelio Ranzato. 2020. Revisiting Self-Training for Neural Sequence Generation. In 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net.Google ScholarGoogle Scholar
  13. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation (1997), 1735--1780.Google ScholarGoogle Scholar
  14. Neil Houlsby, Ferenc Huszár, Zoubin Ghahramani, and Máté Lengyel. 2011. Bayesian active learning for classification and preference learning. arXiv preprint arXiv:1112.5745 (2011).Google ScholarGoogle Scholar
  15. Alex Kendall and Yarin Gal. 2017. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017,, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 5574--5584. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Daniel Khashabi, Tushar Khot, Ashish Sabharwal, and Dan Roth. 2018. Question Answering as Global Reasoning Over Semantic Abstractions. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). 1905--1914.Google ScholarGoogle Scholar
  17. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, Yoshua Bengio and Yann LeCun (Eds.).Google ScholarGoogle Scholar
  18. Hongtao Lin, Jun Yan, Meng Qu, and Xiang Ren. 2019. Learning Dual Retrieval Module for Semi-supervised Relation Extraction. In The World Wide Web Conference, WWW 2019, Ling Liu, Ryen W. White, Amin Mantrach, Fabrizio Silvestri, Julian J. McAuley, Ricardo Baeza-Yates, and Leila Zia (Eds.). 1073--1083. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).Google ScholarGoogle Scholar
  20. Yuanliang Meng and Anna Rumshisky. 2018. Context-Aware Neural Model for Temporal Information Extraction. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 527--536.Google ScholarGoogle ScholarCross RefCross Ref
  21. Yu Meng, Yunyi Zhang, Jiaxin Huang, Chenyan Xiong, Heng Ji, Chao Zhang, and Jiawei Han. 2020. Text Classification Using Label Names Only: A Language Model Self-Training Approach. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 9006--9017.Google ScholarGoogle ScholarCross RefCross Ref
  22. Subhabrata Mukherjee and Ahmed Awadallah. 2020. Uncertainty-aware self-training for few-shot text classification. Advances in Neural Information Processing Systems (2020).Google ScholarGoogle Scholar
  23. Qiang Ning, Zhili Feng, Hao Wu, and Dan Roth. 2018a. Joint Reasoning for Temporal and Causal Relations. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2278--2288.Google ScholarGoogle ScholarCross RefCross Ref
  24. Qiang Ning, Sanjay Subramanian, and Dan Roth. 2019. An Improved Neural Baseline for Temporal Relation Extraction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 6203--6209.Google ScholarGoogle ScholarCross RefCross Ref
  25. Qiang Ning, Hao Wu, and Dan Roth. 2018b. A Multi-Axis Annotation Scheme for Event Temporal Relations. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1318--1328.Google ScholarGoogle ScholarCross RefCross Ref
  26. Qiang Ning, Ben Zhou, Zhili Feng, Haoruo Peng, and Dan Roth. 2018c. CogCompTime: A Tool for Understanding Time in Natural Language. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 72--77.Google ScholarGoogle ScholarCross RefCross Ref
  27. Yilin Niu, Fangkai Jiao, Mantong Zhou, Ting Yao, Jingfang Xu, and Minlie Huang. 2020. A Self-Training Method for Machine Reading Comprehension with Soft Evidence Extraction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 3916--3927.Google ScholarGoogle ScholarCross RefCross Ref
  28. Gerhard Paass. 1993. Assessing and improving neural network predictions by the bootstrap algorithm. In Advances in Neural Information Processing Systems. 196--203. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. James Pustejovsky, Patrick Hanks, Roser Sauri, Andrew See, Robert Gaizauskas, Andrea Setzer, Dragomir Radev, Beth Sundheim, David Day, Lisa Ferro, et al. [n. d.]. The timebank corpus.Google ScholarGoogle Scholar
  30. Chuck Rosenberg, Martial Hebert, and Henry Schneiderman. 2005. Semi-Supervised Self-Training of Object Detection Models. In 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05)-Volume 1. 29--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Antti Tarvainen and Harri Valpola. 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 1195--1204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Naushad UzZaman, Hector Llorens, Leon Derczynski, James Allen, Marc Verhagen, and James Pustejovsky. 2013. SemEval-2013 Task 1: TempEval-3: Evaluating Time Expressions, Events, and Temporal Relations. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013). 1--9.Google ScholarGoogle Scholar
  33. Haoyu Wang, Muhao Chen, Hongming Zhang, and Dan Roth. 2020. Joint Constrained Learning for Event-Event Relation Extraction. In Proceedings of EMNLP. 696--706.Google ScholarGoogle ScholarCross RefCross Ref
  34. Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. 2017. Understanding deep learning requires rethinking generalization. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings.Google ScholarGoogle Scholar

Index Terms

  1. Uncertainty-Aware Self-Training for Semi-Supervised Event Temporal Relation Extraction

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
      October 2021
      4966 pages
      ISBN:9781450384469
      DOI:10.1145/3459637

      Copyright © 2021 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 30 October 2021

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader