CHED: A Cross-Historical Dataset with a Logical Event Schema for Classical Chinese Event Detection

Wei, Congcong; Feng, Zhenbing; Huang, Shutan; Li, Wei; Shao, Yanqiu

doi:10.1007/978-981-99-6207-5_18

Congcong Wei^14,15,
Zhenbing Feng^14,15,
Shutan Huang^14,15,
Wei Li^14,15 &
…
Yanqiu Shao^14,15

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14232))

Included in the following conference series:

China National Conference on Chinese Computational Linguistics

591 Accesses

Abstract

Event detection (ED) is a crucial area of natural language processing that automates the extraction of specific event types from large-scale text, and studying historical ED in classical Chinese texts helps preserve and inherit historical and cultural heritage by extracting valuable information. However, classical Chinese language characteristics, such as ambiguous word classes and complex semantics, have posed challenges and led to a lack of datasets and limited research on event schema construction. In addition, large-scale datasets in English and modern Chinese are not directly applicable to historical ED in classical Chinese. To address these issues, we constructed a logical event schema for classical Chinese historical texts and annotated the resulting dataset, which is called classical Chinese Historical Event Dataset (CHED). The main challenges in our work on classical Chinese historical ED are accurately identifying and classifying events within cultural and linguistic contexts and addressing ambiguity resulting from multiple meanings of words in historical texts. Therefore, we have developed a set of annotation guidelines and provided annotators with an objective reference translation. The average Kappa coefficient after multiple cross-validation is 68.49%, indicating high quality and consistency. We conducted various tasks and comparative experiments on established baseline models for historical ED in classical Chinese. The results showed that BERT+CRF had the best performance on sequence labeling task, with an f1-score of 76.10%, indicating potential for further improvement (The CHED data is released on https://github.com/lcclab-blcu/CHED).

Z. Feng—Equal Contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Experimenting with Unsupervised Multilingual Event Detection in Historical Newspapers

SeRI: A Dataset for Sub-event Relation Inference from an Encyclopedia

Placing (Historical) Facts on a Timeline: A Classification Cum Coref Resolution Approach

Notes

References

Cao, Y., Yusup, A.: Chinese electronic medical record named entity recognition based on BERT-WWM-IDCNN-CRF. In: 9th International Conference on Dependable Systems and Their Applications, DSA 2022, Wulumuqi, China, 4–5 August 2022. pp. 582–589. IEEE (2022). https://doi.org/10.1109/DSA56465.2022.00084
Dang, J.: Research on Knowledge Extraction Method of Chinese Classics Based on Deep Learning. Master’s thesis, North University of China (2021)
Google Scholar
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/n19-1423
Ding, N., et al.: Openprompt: an open-source framework for prompt-learning. In: Basile, V., Kozareva, Z., Stajner, S. (eds.) Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, ACL 2022 - System Demonstrations, Dublin, Ireland, 22–27 May 2022, pp. 105–113. Association for Computational Linguistics (2022). https://doi.org/10.18653/v1/2022.acl-demo.10
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Ji, J., Chen, J., N.L.J.S.: Effect analysis of Chinese event extraction method based on literatures. J. Mod. Inf. 35(12)(3–10) (2015)
Google Scholar
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Brodley, C.E., Danyluk, A.P. (eds.) Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), Williams College, Williamstown, MA, USA, 28 June–1 July 2001, pp. 282–289. Morgan Kaufmann (2001)
Google Scholar
Li, Q., et al.: Type information utilized event detection via multi-channel GNNs in electrical power systems. CoRR abs/2211.08168 (2022). https://doi.org/10.48550/arXiv.2211.08168
Li, X., et al.: DuEE: a large-scale dataset for Chinese event extraction in real-world scenarios. In: Zhu, X., Zhang, M., Hong, Yu., He, R. (eds.) NLPCC 2020. LNCS (LNAI), vol. 12431, pp. 534–545. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60457-8_44
Chapter Google Scholar
Li, Z.: The study on the extraction of war events in Zuo Zhuan based on mixed approaches. Master’s thesis, Nanjing Agricultural University (2019)
Google Scholar
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. CoRR abs/1910.10683 (2019). http://arxiv.org/abs/1910.10683
Shi, X., Chen, Y., Huang, X.: Key problems in conversion from simplified to traditional Chinese characters. In: International Conference on Asian Language Processing (2011)
Google Scholar
Walker, C., Strassel, S., Medero, J., Maeda, K.: ACE 2005 multilingual training corpus. Linguist. Data Consort. Philadelphia 57, 45 (2006)
Google Scholar
Wang, X., et al.: MAVEN: a massive general domain event detection dataset. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, 16–20 November 2020, pp. 1652–1671. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.emnlp-main.129
Xuehan Yu, L.H.J.X.: Extracting events from ancient books based on RoBERTa-CRF. Data Anal. Knowl. Discov. 5(26–35) (2021)
Google Scholar
Yao, F., et al.: LEVEN: a large-scale Chinese legal event detection dataset. In: Muresan, S., Nakov, P., Villavicencio, A. (eds.) Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, 22–27 May 2022, pp. 183–201. Association for Computational Linguistics (2022). https://doi.org/10.18653/v1/2022.findings-acl.17
Zhongbao Liu, J.D.Z.Z.: Research on automatic extraction of historical events and construction of event graph based on historical records. Libr. Inf. Serv. 64(116–124) (2020). https://doi.org/10.13266/j.issn.0252-3116.2020.11.013

Download references

Acknowledgements

This research project is supported by the National Natural Science Foundation of China (61872402), Science Foundation of Beijing Language and Culture University (supported by “the Fundamental Research Funds for the Central Universities”) (18ZDJ03).

Author information

Authors and Affiliations

School of Information Science, Beijing Language and Culture University, Beijing, China
Congcong Wei, Zhenbing Feng, Shutan Huang, Wei Li & Yanqiu Shao
Language Resources Monitoring and Research Center, Beijing, China
Congcong Wei, Zhenbing Feng, Shutan Huang, Wei Li & Yanqiu Shao

Authors

Congcong Wei
View author publications
You can also search for this author in PubMed Google Scholar
Zhenbing Feng
View author publications
You can also search for this author in PubMed Google Scholar
Shutan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Li
View author publications
You can also search for this author in PubMed Google Scholar
Yanqiu Shao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanqiu Shao .

Editor information

Editors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, China
Maosong Sun
Harbin Institute of Technology, Harbin, China
Bing Qin
Fudan University, Shanghai, China
Xipeng Qiu
School of Computing and Information, Singapore Management University, Singapore, Singapore
Jiang Jing
Institute of Software, Chinese Academy of Sciences, Beijing, China
Xianpei Han
Beijing Language and Culture University, Beijing, China
Gaoqi Rao
Chinese Academy of Sciences, Institute of Automation, Beijing, China
Yubo Chen

A Event Schema of the CHED

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wei, C., Feng, Z., Huang, S., Li, W., Shao, Y. (2023). CHED: A Cross-Historical Dataset with a Logical Event Schema for Classical Chinese Event Detection. In: Sun, M., et al. Chinese Computational Linguistics. CCL 2023. Lecture Notes in Computer Science(), vol 14232. Springer, Singapore. https://doi.org/10.1007/978-981-99-6207-5_18

Download citation

DOI: https://doi.org/10.1007/978-981-99-6207-5_18
Published: 20 September 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-6206-8
Online ISBN: 978-981-99-6207-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

CHED: A Cross-Historical Dataset with a Logical Event Schema for Classical Chinese Event Detection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Experimenting with Unsupervised Multilingual Event Detection in Historical Newspapers

SeRI: A Dataset for Sub-event Relation Inference from an Encyclopedia

Placing (Historical) Facts on a Timeline: A Classification Cum Coref Resolution Approach

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Event Schema of the CHED

A Event Schema of the CHED

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us