Skip to main content

Constructing Chinese Historical Literature Knowledge Graph Based on BERT

  • Conference paper
  • First Online:
Web Information Systems and Applications (WISA 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12999))

Included in the following conference series:

Abstract

Knowledge graph construction (KGC) aims to organize knowledge into a semantic network which can reveal relations between entities. Its basis is named entity recognition (NER) and relation extraction (RE) tasks. In recent years, KGC methods for Chinese have made great progress. However, most existing methods concentrate on modern Chinese and ignore the classical Chinese due to its complexity, making research in this field relatively lacking. In this paper, we construct a high-quality classical Chinese labeled dataset for NER and RE tasks. More specifically, we conduct a series of experiments to select an optimal NER model to strengthen the whole pipeline model for NER and RE tasks, augmenting our dataset iteratively and automatically. Additionally, we propose an improved RE model to better combine semantic entity information extracted by the NER model. Moreover, we construct a knowledge graph (KG) based on Chinese historical literature and design a visualization system with intuitive display and query functions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    GuwenBERT https://github.com/ethan-yt/guwenbert.

  2. 2.

    https://neo4j.com/.

References

  1. Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylo, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1247–1250 (2008)

    Google Scholar 

  2. Christopoulou, F., Miwa, M., Ananiadou, S.: A walk-based model on entity graphs for relation extraction. arXiv preprint arXiv:1902.07023 (2019)

  3. Collins, M., Singer, Y.: Unsupervised models for named entity classification. In: 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (1999)

    Google Scholar 

  4. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)

    Google Scholar 

  5. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  6. Gao, Y., Liang, J., Han, B., Yakout, M., Mohamed, A.: Building a large-scale, accurate and fresh knowledge graph. In: KDD-2018, Tutorial, vol. 39, pp. 1939–1374 (2018)

    Google Scholar 

  7. Hashimoto, K., Miwa, M., Tsuruoka, Y., Chikayama, T.: Simple customization of recursive neural networks for semantic relation classification. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1372–1376 (2013)

    Google Scholar 

  8. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  9. Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015)

  10. Katiyar, A., Cardie, C.: Investigating LSTMs for joint extraction of opinion entities and relations. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 919–929 (2016)

    Google Scholar 

  11. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  12. Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data (2001)

    Google Scholar 

  13. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: Albert: a lite BERT for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019)

  14. Li, J., et al.: WCP-RNN: a novel RNN-based approach for bio-NER in Chinese EMRs. J. Supercomput. 76(3), 1450–1467 (2020)

    Article  Google Scholar 

  15. Liu, Y., et al.: Roberta: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

  16. Niu, X., Sun, X., Wang, H., Rong, S., Qi, G., Yu, Y.: Zhishi.me - weaving Chinese linking open data. In: Aroyo, L., et al. (eds.) ISWC 2011. LNCS, vol. 7032, pp. 205–220. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25093-4_14

    Chapter  Google Scholar 

  17. Peters, M.E., et al.: Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018)

  18. Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre-training (2018)

    Google Scholar 

  19. Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web, pp. 697–706 (2007)

    Google Scholar 

  20. Sun, Y., et al.: ERNIE: enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223 (2019)

  21. Vaswani, A., et al.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)

  22. Wang, L., Cao, Z., De Melo, G., Liu, Z.: Relation classification via multi-level attention CNNs. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1298–1307 (2016)

    Google Scholar 

  23. Wang, Z., et al.: XLore: a large-scale English-Chinese bilingual knowledge graph. In: International Semantic Web Conference (Posters & Demos), vol. 1035, pp. 121–124 (2013)

    Google Scholar 

  24. Wikipedia contributors: Named-entity recognition – Wikipedia, the free encyclopedia. https://en.wikipedia.org/w/index.php?title=Named-entity_recognition&oldid=959772078 (2020). Accessed 20 May 2021

  25. Wikipedia contributors: Yellow emperor – Wikipedia, the free encyclopedia (2021). https://en.wikipedia.org/w/index.php?title=Yellow_Emperor&oldid=1038043350. Accessed 14 Aug 2021

  26. Xu, B., et al.: CN-DBpedia: a never-ending Chinese knowledge extraction system. In: Benferhat, S., Tabia, K., Ali, M. (eds.) IEA/AIE 2017. LNCS (LNAI), vol. 10351, pp. 428–438. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-60045-1_44

    Chapter  Google Scholar 

  27. Yu, P., Wang, X.: BERT-based named entity recognition in Chinese twenty-four histories. In: Wang, G., Lin, X., Hendler, J., Song, W., Xu, Z., Liu, G. (eds.) WISA 2020. LNCS, vol. 12432, pp. 289–301. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60029-7_27

    Chapter  Google Scholar 

  28. Zeng, D., Liu, K., Lai, S., Zhou, G., Zhao, J.: Relation classification via convolutional deep neural network. In: Proceedings of COLING 2014, The 25th International Conference on Computational Linguistics: Technical Papers, pp. 2335–2344 (2014)

    Google Scholar 

  29. Zheng, S., et al.: Joint learning of entity semantics and relation pattern for relation extraction. In: Frasconi, P., Landwehr, N., Manco, G., Vreeken, J. (eds.) ECML PKDD 2016. LNCS (LNAI), vol. 9851, pp. 443–458. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46128-1_28

    Chapter  Google Scholar 

Download references

Acknowledgement

This work is supported by the China Universities Industry, Education and Research Innovation Foundation Project (2019ITA03006), and the National Training Programs of Innovation and Entrepreneurship for Undergraduates (202010056117).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Guo, Q. et al. (2021). Constructing Chinese Historical Literature Knowledge Graph Based on BERT. In: Xing, C., Fu, X., Zhang, Y., Zhang, G., Borjigin, C. (eds) Web Information Systems and Applications. WISA 2021. Lecture Notes in Computer Science(), vol 12999. Springer, Cham. https://doi.org/10.1007/978-3-030-87571-8_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-87571-8_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-87570-1

  • Online ISBN: 978-3-030-87571-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics