DuEE-Fin: A Large-Scale Dataset for Document-Level Event Extraction

Han, Cuiyun; Zhang, Jinchuan; Li, Xinyu; Xu, Guojin; Peng, Weihua; Zeng, Zengfeng

doi:10.1007/978-3-031-17120-8_14

Cuiyun Han¹¹,
Jinchuan Zhang¹¹,
Xinyu Li¹¹,
Guojin Xu¹¹,
Weihua Peng¹¹ &
…
Zengfeng Zeng¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13551))

Included in the following conference series:

CCF International Conference on Natural Language Processing and Chinese Computing

3325 Accesses

Abstract

To tackle the data scarcity problem of document-level event extraction, we come up with a large-scale benchmark, DuEE-Fin, which consists of 15,000+ events categorized into 13 event types, and 81,000+ event arguments mapped in 92 argument roles. We constructed DuEE-Fin from real-world Chinese financial news, which allows one document to contain several events, multiple arguments to share the same argument role and one argument to play different roles in different events. Therefore, it presents some considerable challenges in document-level event extraction task such as multi-event recognition and multi-value argument identification, that are referred to as key issues for document-level event extraction task. Along with DuEE-Fin, we also hosted an open competition, which has attracted 1,690 teams and achieved exciting results. We performed experiments on DuEE-Fin with most popular document-level event extraction systems. However, results showed that even some SOTA models performed poorly with our data. Facing these challenges, we found it necessary to propose more effective methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Event-Aware Document-Level Event Extraction via Multi-granularity Event Encoder

DuEE: A Large-Scale Dataset for Chinese Event Extraction in Real-World Scenarios

Structural Dependency Self-attention Based Hierarchical Event Model for Chinese Financial Event Extraction

Notes

1.
https://aistudio.baidu.com/aistudio/competition/detail/46/0/task-definition.

References

Chen, M., et al.: Event-centric natural language processing. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Tutorial Abstracts. pp. 6–14. Association for Computational Linguistics, Online (Aug 2021). https://doi.org/10.18653/v1/2021.acl-tutorials.2, https://aclanthology.org/2021.acl-tutorials.2
Ebner, S., Xia, P., Culkin, R., Rawlins, K., Van Durme, B.: Multi-sentence argument linking. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. pp. 8057–8077. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.acl-main.718, https://www.aclweb.org/anthology/2020.acl-main.718
Fung, Y., et al.: InfoSurgeon: cross-media fine-grained information consistency checking for fake news detection. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). pp. 1683–1698. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.acl-long.133, https://aclanthology.org/2021.acl-long.133
Grishman, R., Sundheim, B.: Message understanding conference- 6: a brief history. In: COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics (1996). https://aclanthology.org/C96-1079
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. CoRR abs/1609.02907 (2016). http://arxiv.org/abs/1609.02907
Li, M., et al.: GAIA: A fine-grained multimedia knowledge extraction system. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. pp. 77–86. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.acl-demos.11, https://www.aclweb.org/anthology/2020.acl-demos.11
Li, S., Ji, H., Han, J.: Document-level event argument extraction by conditional generation. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp. 894–908. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.naacl-main.69, https://www.aclweb.org/anthology/2021.naacl-main.69
Li, X., Li, F., Pan, L., Chen, Y., Peng, W., Wang, Q., Lyu, Y., Zhu, Y.: Duee: a large-scale dataset for Chinese event extraction in real-world scenarios. In: Zhu, X., Zhang, M., Hong, Y., He, R. (eds.) Natural Language Processing and Chinese Computing, pp. 534–545. Springer International Publishing, Cham (2020). https://doi.org/10.1007/978-3-319-73618-110.1007/978-3-319-73618-1
Chapter Google Scholar
Li, Z., Ding, X., Liu, T.: Constructing narrative event evolutionary graph for script event prediction. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. pp. 4201–4207. International Joint Conferences on Artificial Intelligence Organization, Stockholm, Sweden (Jul 2018). https://doi.org/10.24963/ijcai.2018/584, https://www.ijcai.org/proceedings/2018/584
Lin, Y., Ji, H., Huang, F., Wu, L.: A joint neural model for information extraction with global features. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. pp. 7999–8009. Association for Computational Linguistics, Online (Jul 2020). https://doi.org/10.18653/v1/2020.acl-main.713, https://aclanthology.org/2020.acl-main.713
Liu, J., Chen, Y., Liu, K., Bi, W., Liu, X.: Event extraction as machine reading comprehension. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). pp. 1641–1651. Association for Computational Linguistics, Online (Nov 2020). https://doi.org/10.18653/v1/2020.emnlp-main.128, https://aclanthology.org/2020.emnlp-main.128
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. vol. 30, pp. 5998–6008 (2017)
Google Scholar
Wadden, D., Wennberg, U., Luan, Y., Hajishirzi, H.: Entity, relation, and event extraction with contextualized span representations. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). pp. 5784–5789. Association for Computational Linguistics, Hong Kong, China (Nov 2019). https://doi.org/10.18653/v1/D19-1585, https://aclanthology.org/D19-1585
Xu, R., Liu, T., Li, L., Chang, B.: Document-level event extraction via heterogeneous graph-based interaction model with a tracker. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). pp. 3533–3546. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.acl-long.274, https://aclanthology.org/2021.acl-long.274
Yang, H., Chen, Y., Liu, K., Xiao, Y., Zhao, J.: DCFEE: a document-level chinese financial event extraction system based on automatically labeled training data. In: Proceedings of ACL 2018, System Demonstrations. pp. 50–55. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-4009, http://aclweb.org/anthology/P18-4009
Zheng, S., Cao, W., Xu, W., Bian, J.: Doc2EDAG: an end-to-end document-level framework for chinese financial event extraction. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). pp. 337–346. Association for Computational Linguistics, Hong Kong, China (2019). https://doi.org/10.18653/v1/D19-1032, https://www.aclweb.org/anthology/D19-1032
Zhu, T., et al.: Efficient document-level event extraction via pseudo-trigger-aware pruned complete graph (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Baidu Inc., Beijing, China
Cuiyun Han, Jinchuan Zhang, Xinyu Li, Guojin Xu, Weihua Peng & Zengfeng Zeng

Authors

Cuiyun Han
View author publications
You can also search for this author in PubMed Google Scholar
Jinchuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xinyu Li
View author publications
You can also search for this author in PubMed Google Scholar
Guojin Xu
View author publications
You can also search for this author in PubMed Google Scholar
Weihua Peng
View author publications
You can also search for this author in PubMed Google Scholar
Zengfeng Zeng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cuiyun Han .

Editor information

Editors and Affiliations

Singapore University of Technology and Design, Singapore, Singapore
Wei Lu
Nanjing University, Nanjing, China
Shujian Huang
Soochow University, Suzhou, China
Yu Hong
Soochow University, Soochow, China
Xiabing Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Han, C., Zhang, J., Li, X., Xu, G., Peng, W., Zeng, Z. (2022). DuEE-Fin: A Large-Scale Dataset for Document-Level Event Extraction. In: Lu, W., Huang, S., Hong, Y., Zhou, X. (eds) Natural Language Processing and Chinese Computing. NLPCC 2022. Lecture Notes in Computer Science(), vol 13551. Springer, Cham. https://doi.org/10.1007/978-3-031-17120-8_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-17120-8_14
Published: 24 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-17119-2
Online ISBN: 978-3-031-17120-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)

DuEE-Fin: A Large-Scale Dataset for Document-Level Event Extraction