short-paper

Uncertainty-Aware Self-Training for Semi-Supervised Event Temporal Relation Extraction

Authors:
Pengfei Cao

Institute of Automation, CAS & University of Chinese Academy of Sciences, Beijing, China

Institute of Automation, CAS & University of Chinese Academy of Sciences, Beijing, China
View Profile

,
Xinyu Zuo

Institute of Automation, CAS & Tencent, Beijing, China

Institute of Automation, CAS & Tencent, Beijing, China
View Profile

,
Yubo Chen

Institute of Automation, CAS & University of Chinese Academy of Sciences, Beijing, China

Institute of Automation, CAS & University of Chinese Academy of Sciences, Beijing, China
View Profile

,
Kang Liu

Institute of Automation, CAS & University of Chinese Academy of Sciences, Beijing, China

Institute of Automation, CAS & University of Chinese Academy of Sciences, Beijing, China
View Profile

,
Jun Zhao

Institute of Automation, CAS & University of Chinese Academy of Sciences, Beijing, China

Institute of Automation, CAS & University of Chinese Academy of Sciences, Beijing, China
View Profile

,
Wei Bi

Tencent, Beijing, China

Tencent, Beijing, China
View Profile

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge ManagementOctober 2021Pages 2900–2904https://doi.org/10.1145/3459637.3482207

Published:30 October 2021Publication History

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pages 2900–2904

ABSTRACT

Extracting event temporal relations is an important task for natural language understanding. Many works have been proposed for supervised event temporal relation extraction, which typically requires a large amount of human-annotated data for model training. However, the data annotation for this task is very time-consuming and challenging. To this end, we study the problem of semi-supervised event temporal relation extraction. Self-training as a widely used semi-supervised learning method can be utilized for this problem. However, it suffers from the noisy pseudo-labeling problem. In this paper, we propose the use of uncertainty-aware self-training framework (UAST) to quantify the model uncertainty for coping with pseudo-labeling errors. Specifically, UAST utilizes (1) Uncertainty Estimation module to compute the model uncertainty for pseudo-labeling unlabeled data; (2) Sample Selection with Exploration module to select informative samples based on uncertainty estimates; and (3) Uncertainty-Aware Learning module to explicitly incorporate the model uncertainty into the self-training process. Experimental results indicate that our approach significantly outperforms previous state-of-the-art methods.

Supplemental Material

CIKM2021.mp4

mp4

91.8 MB

Download

References

Taylor Cassidy, Bill McDowell, Nathanael Chambers, and Steven Bethard. 2014. An Annotation Framework for Dense Event Ordering. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 501--506.Google ScholarCross Ref
Nathanael Chambers, Taylor Cassidy, Bill McDowell, and Steven Bethard. 2014. Dense Event Ordering with a Multi-Pass Architecture. Transactions of the Association for Computational Linguistics (2014), 273--284.Google Scholar
Snigdha Chaturvedi, Haoruo Peng, and Dan Roth. 2017. Story Comprehension for Predicting What Happens Next. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 1603--1614.Google ScholarCross Ref
Fei Cheng and Yusuke Miyao. 2017. Classifying Temporal Relations by Bidirectional LS™ over Dependency Paths. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 1--6.Google ScholarCross Ref
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of NAACL-HLT. 4171--4186.Google Scholar
Quang Do, Wei Lu, and Dan Roth. 2012. Joint Inference for Event Timeline Construction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 677--687. Google ScholarDigital Library
Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016,, Maria-Florina Balcan and Kilian Q. Weinberger (Eds.). 1050--1059. Google ScholarDigital Library
Yarin Gal, Riashat Islam, and Zoubin Ghahramani. 2017. Deep Bayesian Active Learning with Image Data. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Doina Precup and Yee Whye Teh (Eds.). 1183--1192. Google ScholarDigital Library
Rujun Han, I-Hung Hsu, Mu Yang, Aram Galstyan, Ralph Weischedel, and Nanyun Peng. 2019 a. Deep Structured Neural Network for Event Temporal Relation Extraction. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL). 666--106.Google ScholarCross Ref
Rujun Han, Qiang Ning, and Nanyun Peng. 2019 b. Joint Event and Temporal Relation Extraction with Shared Representations and Structured Prediction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 434--444.Google ScholarCross Ref
Rujun Han, Yichao Zhou, and Nanyun Peng. 2020. Domain Knowledge Empowered Structured Neural Net for End-to-End Event Temporal Relation Extraction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 5717--5729.Google ScholarCross Ref
Junxian He, Jiatao Gu, Jiajun Shen, and Marc'Aurelio Ranzato. 2020. Revisiting Self-Training for Neural Sequence Generation. In 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net.Google Scholar
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation (1997), 1735--1780.Google Scholar
Neil Houlsby, Ferenc Huszár, Zoubin Ghahramani, and Máté Lengyel. 2011. Bayesian active learning for classification and preference learning. arXiv preprint arXiv:1112.5745 (2011).Google Scholar
Alex Kendall and Yarin Gal. 2017. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017,, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 5574--5584. Google ScholarDigital Library
Daniel Khashabi, Tushar Khot, Ashish Sabharwal, and Dan Roth. 2018. Question Answering as Global Reasoning Over Semantic Abstractions. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). 1905--1914.Google Scholar
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, Yoshua Bengio and Yann LeCun (Eds.).Google Scholar
Hongtao Lin, Jun Yan, Meng Qu, and Xiang Ren. 2019. Learning Dual Retrieval Module for Semi-supervised Relation Extraction. In The World Wide Web Conference, WWW 2019, Ling Liu, Ryen W. White, Amin Mantrach, Fabrizio Silvestri, Julian J. McAuley, Ricardo Baeza-Yates, and Leila Zia (Eds.). 1073--1083. Google ScholarDigital Library
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).Google Scholar
Yuanliang Meng and Anna Rumshisky. 2018. Context-Aware Neural Model for Temporal Information Extraction. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 527--536.Google ScholarCross Ref
Yu Meng, Yunyi Zhang, Jiaxin Huang, Chenyan Xiong, Heng Ji, Chao Zhang, and Jiawei Han. 2020. Text Classification Using Label Names Only: A Language Model Self-Training Approach. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 9006--9017.Google ScholarCross Ref
Subhabrata Mukherjee and Ahmed Awadallah. 2020. Uncertainty-aware self-training for few-shot text classification. Advances in Neural Information Processing Systems (2020).Google Scholar
Qiang Ning, Zhili Feng, Hao Wu, and Dan Roth. 2018a. Joint Reasoning for Temporal and Causal Relations. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2278--2288.Google ScholarCross Ref
Qiang Ning, Sanjay Subramanian, and Dan Roth. 2019. An Improved Neural Baseline for Temporal Relation Extraction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 6203--6209.Google ScholarCross Ref
Qiang Ning, Hao Wu, and Dan Roth. 2018b. A Multi-Axis Annotation Scheme for Event Temporal Relations. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1318--1328.Google ScholarCross Ref
Qiang Ning, Ben Zhou, Zhili Feng, Haoruo Peng, and Dan Roth. 2018c. CogCompTime: A Tool for Understanding Time in Natural Language. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 72--77.Google ScholarCross Ref
Yilin Niu, Fangkai Jiao, Mantong Zhou, Ting Yao, Jingfang Xu, and Minlie Huang. 2020. A Self-Training Method for Machine Reading Comprehension with Soft Evidence Extraction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 3916--3927.Google ScholarCross Ref
Gerhard Paass. 1993. Assessing and improving neural network predictions by the bootstrap algorithm. In Advances in Neural Information Processing Systems. 196--203. Google ScholarDigital Library
James Pustejovsky, Patrick Hanks, Roser Sauri, Andrew See, Robert Gaizauskas, Andrea Setzer, Dragomir Radev, Beth Sundheim, David Day, Lisa Ferro, et al. [n. d.]. The timebank corpus.Google Scholar
Chuck Rosenberg, Martial Hebert, and Henry Schneiderman. 2005. Semi-Supervised Self-Training of Object Detection Models. In 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05)-Volume 1. 29--36. Google ScholarDigital Library
Antti Tarvainen and Harri Valpola. 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 1195--1204. Google ScholarDigital Library
Naushad UzZaman, Hector Llorens, Leon Derczynski, James Allen, Marc Verhagen, and James Pustejovsky. 2013. SemEval-2013 Task 1: TempEval-3: Evaluating Time Expressions, Events, and Temporal Relations. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013). 1--9.Google Scholar
Haoyu Wang, Muhao Chen, Hongming Zhang, and Dan Roth. 2020. Joint Constrained Learning for Event-Event Relation Extraction. In Proceedings of EMNLP. 696--706.Google ScholarCross Ref
Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. 2017. Understanding deep learning requires rethinking generalization. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings.Google Scholar

Index Terms

Uncertainty-Aware Self-Training for Semi-Supervised Event Temporal Relation Extraction
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Information extraction

Recommendations

Sentiment labeling for extending initial labeled data to improve semi-supervised sentiment classification

Semi-supervised framework which exploits unsupervised approach (JST) is proposed.Self-training suffers from incorrectly labeling problem with insufficient data.Confidently predicted instances are labeled and used as training data by JST.Self-training ...
Read More
Self-Training with Selection-by-Rejection
ICDM '12: Proceedings of the 2012 IEEE 12th International Conference on Data Mining

Practical machine learning and data mining problems often face shortage of labeled training data. Self-training algorithms are among the earliest attempts of using unlabeled data to enhance learning. Traditional self-training algorithms label unlabeled ...
Read More
Uncertainty-aware deep co-training for semi-supervised medical image segmentation
Abstract
Semi-supervised learning has made significant strides in the medical domain since it alleviates the heavy burden of collecting abundant pixel-wise annotated data for semantic segmentation tasks. Existing semi-supervised approaches ...
Highlights
- We exposed the flaws of the semi-supervised segmentation method for medical images.
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
October 2021
4966 pages
ISBN:9781450384469
DOI:10.1145/3459637
General Chairs:
Gianluca Demartini
The University of Queensland, Australia
,
Guido Zuccon
The University of Queensland, Australia
,
Program Chairs:
J. Shane Culpepper
RMIT University, Australia
,
Zi Huang
The University of Queensland, Australia
,
Hanghang Tong
University of Illinois at Urbana-Champaign, USA
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 October 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
event temporal relation extraction
self-training
uncertainty
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 363
  Total Downloads
- Downloads (Last 12 months)52
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Uncertainty-Aware Self-Training for Semi-Supervised Event Temporal Relation Extraction

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Sentiment labeling for extending initial labeled data to improve semi-supervised sentiment classification

Self-Training with Selection-by-Rejection

Uncertainty-aware deep co-training for semi-supervised medical image segmentation