Abstract
Event detection (ED) is a crucial area of natural language processing that automates the extraction of specific event types from large-scale text, and studying historical ED in classical Chinese texts helps preserve and inherit historical and cultural heritage by extracting valuable information. However, classical Chinese language characteristics, such as ambiguous word classes and complex semantics, have posed challenges and led to a lack of datasets and limited research on event schema construction. In addition, large-scale datasets in English and modern Chinese are not directly applicable to historical ED in classical Chinese. To address these issues, we constructed a logical event schema for classical Chinese historical texts and annotated the resulting dataset, which is called classical Chinese Historical Event Dataset (CHED). The main challenges in our work on classical Chinese historical ED are accurately identifying and classifying events within cultural and linguistic contexts and addressing ambiguity resulting from multiple meanings of words in historical texts. Therefore, we have developed a set of annotation guidelines and provided annotators with an objective reference translation. The average Kappa coefficient after multiple cross-validation is 68.49%, indicating high quality and consistency. We conducted various tasks and comparative experiments on established baseline models for historical ED in classical Chinese. The results showed that BERT+CRF had the best performance on sequence labeling task, with an f1-score of 76.10%, indicating potential for further improvement (The CHED data is released on https://github.com/lcclab-blcu/CHED).
Z. Feng—Equal Contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Cao, Y., Yusup, A.: Chinese electronic medical record named entity recognition based on BERT-WWM-IDCNN-CRF. In: 9th International Conference on Dependable Systems and Their Applications, DSA 2022, Wulumuqi, China, 4–5 August 2022. pp. 582–589. IEEE (2022). https://doi.org/10.1109/DSA56465.2022.00084
Dang, J.: Research on Knowledge Extraction Method of Chinese Classics Based on Deep Learning. Master’s thesis, North University of China (2021)
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/n19-1423
Ding, N., et al.: Openprompt: an open-source framework for prompt-learning. In: Basile, V., Kozareva, Z., Stajner, S. (eds.) Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, ACL 2022 - System Demonstrations, Dublin, Ireland, 22–27 May 2022, pp. 105–113. Association for Computational Linguistics (2022). https://doi.org/10.18653/v1/2022.acl-demo.10
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Ji, J., Chen, J., N.L.J.S.: Effect analysis of Chinese event extraction method based on literatures. J. Mod. Inf. 35(12)(3–10) (2015)
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Brodley, C.E., Danyluk, A.P. (eds.) Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), Williams College, Williamstown, MA, USA, 28 June–1 July 2001, pp. 282–289. Morgan Kaufmann (2001)
Li, Q., et al.: Type information utilized event detection via multi-channel GNNs in electrical power systems. CoRR abs/2211.08168 (2022). https://doi.org/10.48550/arXiv.2211.08168
Li, X., et al.: DuEE: a large-scale dataset for Chinese event extraction in real-world scenarios. In: Zhu, X., Zhang, M., Hong, Yu., He, R. (eds.) NLPCC 2020. LNCS (LNAI), vol. 12431, pp. 534–545. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60457-8_44
Li, Z.: The study on the extraction of war events in Zuo Zhuan based on mixed approaches. Master’s thesis, Nanjing Agricultural University (2019)
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. CoRR abs/1910.10683 (2019). http://arxiv.org/abs/1910.10683
Shi, X., Chen, Y., Huang, X.: Key problems in conversion from simplified to traditional Chinese characters. In: International Conference on Asian Language Processing (2011)
Walker, C., Strassel, S., Medero, J., Maeda, K.: ACE 2005 multilingual training corpus. Linguist. Data Consort. Philadelphia 57, 45 (2006)
Wang, X., et al.: MAVEN: a massive general domain event detection dataset. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, 16–20 November 2020, pp. 1652–1671. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.emnlp-main.129
Xuehan Yu, L.H.J.X.: Extracting events from ancient books based on RoBERTa-CRF. Data Anal. Knowl. Discov. 5(26–35) (2021)
Yao, F., et al.: LEVEN: a large-scale Chinese legal event detection dataset. In: Muresan, S., Nakov, P., Villavicencio, A. (eds.) Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, 22–27 May 2022, pp. 183–201. Association for Computational Linguistics (2022). https://doi.org/10.18653/v1/2022.findings-acl.17
Zhongbao Liu, J.D.Z.Z.: Research on automatic extraction of historical events and construction of event graph based on historical records. Libr. Inf. Serv. 64(116–124) (2020). https://doi.org/10.13266/j.issn.0252-3116.2020.11.013
Acknowledgements
This research project is supported by the National Natural Science Foundation of China (61872402), Science Foundation of Beijing Language and Culture University (supported by “the Fundamental Research Funds for the Central Universities”) (18ZDJ03).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wei, C., Feng, Z., Huang, S., Li, W., Shao, Y. (2023). CHED: A Cross-Historical Dataset with a Logical Event Schema for Classical Chinese Event Detection. In: Sun, M., et al. Chinese Computational Linguistics. CCL 2023. Lecture Notes in Computer Science(), vol 14232. Springer, Singapore. https://doi.org/10.1007/978-981-99-6207-5_18
Download citation
DOI: https://doi.org/10.1007/978-981-99-6207-5_18
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-6206-8
Online ISBN: 978-981-99-6207-5
eBook Packages: Computer ScienceComputer Science (R0)