Constructing Chinese Historical Literature Knowledge Graph Based on BERT

Guo, Qingyan; Sun, Yang; Liu, Guanzhong; Wang, Zijun; Ji, Zijing; Shen, Yuxin; Wang, Xin

doi:10.1007/978-3-030-87571-8_28

Qingyan Guo¹³,
Yang Sun¹³,
Guanzhong Liu¹³,
Zijun Wang¹³,
Zijing Ji¹³,
Yuxin Shen¹³ &
…
Xin Wang^13,14

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12999))

Included in the following conference series:

International Conference on Web Information Systems and Applications

2781 Accesses
4 Citations

Abstract

Knowledge graph construction (KGC) aims to organize knowledge into a semantic network which can reveal relations between entities. Its basis is named entity recognition (NER) and relation extraction (RE) tasks. In recent years, KGC methods for Chinese have made great progress. However, most existing methods concentrate on modern Chinese and ignore the classical Chinese due to its complexity, making research in this field relatively lacking. In this paper, we construct a high-quality classical Chinese labeled dataset for NER and RE tasks. More specifically, we conduct a series of experiments to select an optimal NER model to strengthen the whole pipeline model for NER and RE tasks, augmenting our dataset iteratively and automatically. Additionally, we propose an improved RE model to better combine semantic entity information extracted by the NER model. Moreover, we construct a knowledge graph (KG) based on Chinese historical literature and design a visualization system with intuitive display and query functions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
GuwenBERT https://github.com/ethan-yt/guwenbert.
2.
https://neo4j.com/.

References

Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylo, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1247–1250 (2008)
Google Scholar
Christopoulou, F., Miwa, M., Ananiadou, S.: A walk-based model on entity graphs for relation extraction. arXiv preprint arXiv:1902.07023 (2019)
Collins, M., Singer, Y.: Unsupervised models for named entity classification. In: 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (1999)
Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Gao, Y., Liang, J., Han, B., Yakout, M., Mohamed, A.: Building a large-scale, accurate and fresh knowledge graph. In: KDD-2018, Tutorial, vol. 39, pp. 1939–1374 (2018)
Google Scholar
Hashimoto, K., Miwa, M., Tsuruoka, Y., Chikayama, T.: Simple customization of recursive neural networks for semantic relation classification. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1372–1376 (2013)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015)
Katiyar, A., Cardie, C.: Investigating LSTMs for joint extraction of opinion entities and relations. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 919–929 (2016)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data (2001)
Google Scholar
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: Albert: a lite BERT for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019)
Li, J., et al.: WCP-RNN: a novel RNN-based approach for bio-NER in Chinese EMRs. J. Supercomput. 76(3), 1450–1467 (2020)
Article Google Scholar
Liu, Y., et al.: Roberta: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Niu, X., Sun, X., Wang, H., Rong, S., Qi, G., Yu, Y.: Zhishi.me - weaving Chinese linking open data. In: Aroyo, L., et al. (eds.) ISWC 2011. LNCS, vol. 7032, pp. 205–220. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25093-4_14
Chapter Google Scholar
Peters, M.E., et al.: Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018)
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre-training (2018)
Google Scholar
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web, pp. 697–706 (2007)
Google Scholar
Sun, Y., et al.: ERNIE: enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223 (2019)
Vaswani, A., et al.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)
Wang, L., Cao, Z., De Melo, G., Liu, Z.: Relation classification via multi-level attention CNNs. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1298–1307 (2016)
Google Scholar
Wang, Z., et al.: XLore: a large-scale English-Chinese bilingual knowledge graph. In: International Semantic Web Conference (Posters & Demos), vol. 1035, pp. 121–124 (2013)
Google Scholar
Wikipedia contributors: Named-entity recognition – Wikipedia, the free encyclopedia. https://en.wikipedia.org/w/index.php?title=Named-entity_recognition&oldid=959772078 (2020). Accessed 20 May 2021
Wikipedia contributors: Yellow emperor – Wikipedia, the free encyclopedia (2021). https://en.wikipedia.org/w/index.php?title=Yellow_Emperor&oldid=1038043350. Accessed 14 Aug 2021
Xu, B., et al.: CN-DBpedia: a never-ending Chinese knowledge extraction system. In: Benferhat, S., Tabia, K., Ali, M. (eds.) IEA/AIE 2017. LNCS (LNAI), vol. 10351, pp. 428–438. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-60045-1_44
Chapter Google Scholar
Yu, P., Wang, X.: BERT-based named entity recognition in Chinese twenty-four histories. In: Wang, G., Lin, X., Hendler, J., Song, W., Xu, Z., Liu, G. (eds.) WISA 2020. LNCS, vol. 12432, pp. 289–301. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60029-7_27
Chapter Google Scholar
Zeng, D., Liu, K., Lai, S., Zhou, G., Zhao, J.: Relation classification via convolutional deep neural network. In: Proceedings of COLING 2014, The 25th International Conference on Computational Linguistics: Technical Papers, pp. 2335–2344 (2014)
Google Scholar
Zheng, S., et al.: Joint learning of entity semantics and relation pattern for relation extraction. In: Frasconi, P., Landwehr, N., Manco, G., Vreeken, J. (eds.) ECML PKDD 2016. LNCS (LNAI), vol. 9851, pp. 443–458. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46128-1_28
Chapter Google Scholar

Download references

Acknowledgement

This work is supported by the China Universities Industry, Education and Research Innovation Foundation Project (2019ITA03006), and the National Training Programs of Innovation and Entrepreneurship for Undergraduates (202010056117).

Author information

Authors and Affiliations

College of Intelligence and Computing, Tianjin University, Tianjin, China
Qingyan Guo, Yang Sun, Guanzhong Liu, Zijun Wang, Zijing Ji, Yuxin Shen & Xin Wang
Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin, 300072, China
Xin Wang

Authors

Qingyan Guo
View author publications
You can also search for this author in PubMed Google Scholar
Yang Sun
View author publications
You can also search for this author in PubMed Google Scholar
Guanzhong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zijun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zijing Ji
View author publications
You can also search for this author in PubMed Google Scholar
Yuxin Shen
View author publications
You can also search for this author in PubMed Google Scholar
Xin Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xin Wang .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Chunxiao Xing
Institute of Computer Science, University of Göttingen, Goettingen, Germany
Xiaoming Fu
Tsinghua University, Beijing, China
Yong Zhang
Chinese Academy of Sciences, Beijing, China
Guigang Zhang
Renmin University of China, Beijing, China
Chaolemen Borjigin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guo, Q. et al. (2021). Constructing Chinese Historical Literature Knowledge Graph Based on BERT. In: Xing, C., Fu, X., Zhang, Y., Zhang, G., Borjigin, C. (eds) Web Information Systems and Applications. WISA 2021. Lecture Notes in Computer Science(), vol 12999. Springer, Cham. https://doi.org/10.1007/978-3-030-87571-8_28

Download citation

DOI: https://doi.org/10.1007/978-3-030-87571-8_28
Published: 17 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87570-1
Online ISBN: 978-3-030-87571-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)

Constructing Chinese Historical Literature Knowledge Graph Based on BERT