Skip to main content

CHED: A Cross-Historical Dataset with a Logical Event Schema for Classical Chinese Event Detection

  • Conference paper
  • First Online:
Chinese Computational Linguistics (CCL 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14232))

Included in the following conference series:

  • 591 Accesses

Abstract

Event detection (ED) is a crucial area of natural language processing that automates the extraction of specific event types from large-scale text, and studying historical ED in classical Chinese texts helps preserve and inherit historical and cultural heritage by extracting valuable information. However, classical Chinese language characteristics, such as ambiguous word classes and complex semantics, have posed challenges and led to a lack of datasets and limited research on event schema construction. In addition, large-scale datasets in English and modern Chinese are not directly applicable to historical ED in classical Chinese. To address these issues, we constructed a logical event schema for classical Chinese historical texts and annotated the resulting dataset, which is called classical Chinese Historical Event Dataset (CHED). The main challenges in our work on classical Chinese historical ED are accurately identifying and classifying events within cultural and linguistic contexts and addressing ambiguity resulting from multiple meanings of words in historical texts. Therefore, we have developed a set of annotation guidelines and provided annotators with an objective reference translation. The average Kappa coefficient after multiple cross-validation is 68.49%, indicating high quality and consistency. We conducted various tasks and comparative experiments on established baseline models for historical ED in classical Chinese. The results showed that BERT+CRF had the best performance on sequence labeling task, with an f1-score of 76.10%, indicating potential for further improvement (The CHED data is released on https://github.com/lcclab-blcu/CHED).

Z. Feng—Equal Contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/NiuTrans/Classical-Modern.

  2. 2.

    https://github.com//hankcs/pyhanlp.

  3. 3.

    https://github.com/facebookresearch/fastText.

  4. 4.

    https://guoxue.httpcn.com/zt/24shi/.

  5. 5.

    https://github.com/doccano.

  6. 6.

    https://github.com/taishan1994/pytorch_bert_bilstm_crf_ner.

  7. 7.

    https://github.com/thunlp/OpenPrompt.

References

  1. Cao, Y., Yusup, A.: Chinese electronic medical record named entity recognition based on BERT-WWM-IDCNN-CRF. In: 9th International Conference on Dependable Systems and Their Applications, DSA 2022, Wulumuqi, China, 4–5 August 2022. pp. 582–589. IEEE (2022). https://doi.org/10.1109/DSA56465.2022.00084

  2. Dang, J.: Research on Knowledge Extraction Method of Chinese Classics Based on Deep Learning. Master’s thesis, North University of China (2021)

    Google Scholar 

  3. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/n19-1423

  4. Ding, N., et al.: Openprompt: an open-source framework for prompt-learning. In: Basile, V., Kozareva, Z., Stajner, S. (eds.) Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, ACL 2022 - System Demonstrations, Dublin, Ireland, 22–27 May 2022, pp. 105–113. Association for Computational Linguistics (2022). https://doi.org/10.18653/v1/2022.acl-demo.10

  5. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735

  6. Ji, J., Chen, J., N.L.J.S.: Effect analysis of Chinese event extraction method based on literatures. J. Mod. Inf. 35(12)(3–10) (2015)

    Google Scholar 

  7. Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Brodley, C.E., Danyluk, A.P. (eds.) Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), Williams College, Williamstown, MA, USA, 28 June–1 July 2001, pp. 282–289. Morgan Kaufmann (2001)

    Google Scholar 

  8. Li, Q., et al.: Type information utilized event detection via multi-channel GNNs in electrical power systems. CoRR abs/2211.08168 (2022). https://doi.org/10.48550/arXiv.2211.08168

  9. Li, X., et al.: DuEE: a large-scale dataset for Chinese event extraction in real-world scenarios. In: Zhu, X., Zhang, M., Hong, Yu., He, R. (eds.) NLPCC 2020. LNCS (LNAI), vol. 12431, pp. 534–545. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60457-8_44

    Chapter  Google Scholar 

  10. Li, Z.: The study on the extraction of war events in Zuo Zhuan based on mixed approaches. Master’s thesis, Nanjing Agricultural University (2019)

    Google Scholar 

  11. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. CoRR abs/1910.10683 (2019). http://arxiv.org/abs/1910.10683

  12. Shi, X., Chen, Y., Huang, X.: Key problems in conversion from simplified to traditional Chinese characters. In: International Conference on Asian Language Processing (2011)

    Google Scholar 

  13. Walker, C., Strassel, S., Medero, J., Maeda, K.: ACE 2005 multilingual training corpus. Linguist. Data Consort. Philadelphia 57, 45 (2006)

    Google Scholar 

  14. Wang, X., et al.: MAVEN: a massive general domain event detection dataset. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, 16–20 November 2020, pp. 1652–1671. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.emnlp-main.129

  15. Xuehan Yu, L.H.J.X.: Extracting events from ancient books based on RoBERTa-CRF. Data Anal. Knowl. Discov. 5(26–35) (2021)

    Google Scholar 

  16. Yao, F., et al.: LEVEN: a large-scale Chinese legal event detection dataset. In: Muresan, S., Nakov, P., Villavicencio, A. (eds.) Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, 22–27 May 2022, pp. 183–201. Association for Computational Linguistics (2022). https://doi.org/10.18653/v1/2022.findings-acl.17

  17. Zhongbao Liu, J.D.Z.Z.: Research on automatic extraction of historical events and construction of event graph based on historical records. Libr. Inf. Serv. 64(116–124) (2020). https://doi.org/10.13266/j.issn.0252-3116.2020.11.013

Download references

Acknowledgements

This research project is supported by the National Natural Science Foundation of China (61872402), Science Foundation of Beijing Language and Culture University (supported by “the Fundamental Research Funds for the Central Universities”) (18ZDJ03).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanqiu Shao .

Editor information

Editors and Affiliations

A Event Schema of the CHED

A Event Schema of the CHED

Fig. 11.
figure 11

Event schema of the CHED in English

Fig. 12.
figure 12

Event schema of the CHED in Chinese

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wei, C., Feng, Z., Huang, S., Li, W., Shao, Y. (2023). CHED: A Cross-Historical Dataset with a Logical Event Schema for Classical Chinese Event Detection. In: Sun, M., et al. Chinese Computational Linguistics. CCL 2023. Lecture Notes in Computer Science(), vol 14232. Springer, Singapore. https://doi.org/10.1007/978-981-99-6207-5_18

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-6207-5_18

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-6206-8

  • Online ISBN: 978-981-99-6207-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics