skip to main content
research-article

A Joint Entity-Relation Detection and Generalization Method Based on Syntax and Semantics for Chinese Intangible Cultural Heritage Texts

Published: 13 January 2024 Publication History

Abstract

Annotation of a natural language corpus not only facilitates researchers in extracting knowledge from it but also helps achieve deeper mining of the corpus. However, an annotated corpus in the humanities knowledge domain is lacking. In addition, the semantic annotation of humanities texts is difficult, because it requires a high domain background for researchers and even requires the participation of domain experts. Based on this, this study proposes a method for detecting entities and relations in a domain that lacks an annotated corpus, as well as provides a referenceable idea for constructing conceptual models based on textual instances. Based on syntactic and semantic features, this study proposes SPO triple recognition rules from the perspective of giving priority to predicates and generalization rules from the perspective of a triple’s content and the meaning of its predicate. The recognition rules are used to extract text-descriptive SPO triples centered on predicates. After clustering and adjusting triples, the generalization rules proposed in this study are used to obtain coarse-grained entities and relations, and then form a conceptual model. This study recognizes SPO triples with high precision and summarization from descriptive texts, generalizes them, and then forms a domain conceptual model. Our proposed method provides a research idea for entity-relation detection in a domain with a missing annotated corpus, and the formed domain conceptual model provides a reference for building a domain Linked Data Graph. The feasibility of the method is verified through practice on texts related to the four traditional Chinese festivals.

References

[1]
W. Zhang, H. Wang, Y. Li, and S. Deng. 2021. Research on intangible cultural heritage text-oriented knowledge organization model and humanistic atlas construction. Information and Documentation Services 6 (2021), 91–101.
[2]
J. Dou, J. Qin, Z. Jin, and Z. Li. 2018. Knowledge graph based on domain ontology and natural language processing technology for Chinese intangible cultural heritage. Journal of Visual Languages & Computing 48 (2018), 19–28. DOI:
[3]
J. Ou, S. Peng, and Z. Li. 2019. Organization and reconstruction of library’s humanities data under the background of digital humanities. Library and Information Service 11 (2019), 15–24. DOI:
[4]
X. Hou, G. Tan, W. Zhuang, and M. Tang. 2019. Research on knowledge management of intangible cultural heritage based on Linked Data. Journal of Library Science in China 2 (2019), 88–108. DOI:
[5]
Z. Li and L. He. 2020. Case study on semantic organization of cultural heritage. Library and Information Service 7 (2020), 4–12. DOI:
[6]
K. Dong. 2015. The research of semantic organization of intangible cultural heritage based on Linked Data. Journal of Modern Information 2 (2015), 12–17.
[7]
M. Agathos, E. Kalogeros, M. Gergatsoulis, and G. Papaioannou. 2022. Documenting architectural styles using CIDOC CRM. In From Born-Physical to Born-Virtual: Augmenting Intelligence in Digital Libraries. Lecture Notes in Computer Science, Vol. 13636. Springer, 345–359. DOI:
[8]
Faraj Ghazal and Micsik András. 2021. Representing and validating cultural heritage knowledge graphs in CIDOC-CRM ontology. Future Internet 13, 11 (2021), 277. DOI:
[9]
T. Yao, Y. Chen, G. Liu, and C. Lu. 2019. Research on semantic knowledge organization of digital humanities historical materials resources based on CIDCO-CRM—Take Zhang Xueliang’s historical data resources as an example. Journal of Library Science 7 (2019), 35–43. DOI:
[10]
L. He, Y. Chen, and K. Sun. 2020. Research on ontology building methods of Chinese ancient books. Library and Information Service 7 (2020), 13–19. DOI:
[11]
E. M. Sanfilippo, B. Markhoff, and P. Pittet. 2020. Ontological analysis and modularization of CIDOC-CRM. In Proceedings of the International Conference on Formal Ontology in Information Systems. DOI:
[12]
M. Van Ruymbeke, P. Hallot, G. Nys, and R. Billen. 2018. Implementation of multiple interpretation data model concepts in CIDOC CRM and compatible models. Virtual Archaeology Review 9, 19 (2018), 50–65. DOI:
[13]
D. Li, Y. Zhang, D. Li, and D. Lin. 2020. Review of entity relation extraction methods. Journal of Computer Research and Development 7 (2020), 1424–1448.
[14]
Y. Zhang, J. Xu, H. Chen, J. Wang, Y. Wu, M. Prakasam, and H. Xu. 2016. Chemical named entity recognition in patents by domain knowledge and unsupervised feature learning. Database: The Journal of Biological Databases and Curation 2016 (2016), baw049. https://doi.org/10.1093/database/baw049
[15]
Seth van Hooland, Max De Wilde, Ruben Verborgh, Thomas Steiner, and Rik Van de Walle. 2015. Exploring entity recognition and disambiguation for cultural heritage collections. Literary and Linguistic Computing 30, 2 (2015), 262–279. DOI:
[16]
N. Jain and R. Krestel. 2019. Who is Mona L.? Identifying mentions of artworks in historical archives. In Digital Libraries for Open Knowledge. Lecture Notes in Computer Science, Vol. 11799. Springer, 115–122.
[17]
N. Jain, A. Sierra-Múnera, J. Ehmueller, and R. Krestel. 2023. Generation of training data for named entity recognition of artworks. Semantic Web 14, 2 (2023), 239–260. DOI:
[18]
Y. Huang, W. Lu, Q. Cheng, and S. Deng. 2016. Design and implementation of intangible cultural heritage knowledge ontology construction system—Taking Tibet’s “guozhuang” and “duixie” as examples. Journal of Tibet University for Nationalities (Philosophy Social Science Edition) 37, 1 (2016), 20–26+154.
[19]
Y. Zhou, Y. Zhao, and J. Sun. 2017. Research path for intangible cultural heritage information resource organization and retrieval: Investigation and design based on ontology method. Journal of Intelligence 8 (2017), 166–174.
[20]
H. S. Putra, F. S. Priatmadji, and R. Mahendra. 2020. Semi-supervised named-entity recognition for product attribute extraction in book domain. In Digital Libraries at Times of Massive Societal Transition. Lecture Notes in Computer Science, Vol. 12504. Springer, 43–51. https://doi.org/10.1007/978-3-030-64452-9_4
[21]
M. Wang, H. Wang, G. Qi, and Q. Zheng. 2020. Richpedia: A large-scale, comprehensive multi-modal knowledge graph. Big Data Research 22 (2020), 100159. DOI:
[22]
H. Wei. 2017. Research on Construction of Minority Festival Domain Ontology and Semantic Retrieval Model. Master’s Thesis. Yunnan Normal University.
[23]
X. Wang, B. Zhang, and H. Li. 2016. Overview of ontology research. Journal of Intelligence 35, 6 (2016), 163–170.
[24]
Z. Lin. 2009. A review of theoretical studies on ontology concept model construction. Information Research 5 (2009), 30–33.
[25]
C. Teng and P. Wang. 2018. The construction of intangible cultural heritage resources knowledge organization ontology. Information Science 4 (2018), 160–163+176. DOI:
[26]
Q. Cheng, Y. Zhou, and Y. Dai. 2011. Classification and organization of intangible cultural heritage: A method based on ontology tool. Journal of Information Resources Management 3 (2011), 78–83. DOI:
[27]
T. Hao. 2011. Research on Knowledge Ontology Construction and Representation Method of Traditional Festivals. Master’s Thesis. Central China Normal University. https://kns.cnki.net/KCMS/detail/detail.aspx?dbname=CMFD2011&filename=1011138731.nh
[28]
Z. Yan. 2017. Research on Knowledge Model of Mid-Autumn Festival Based on Ontology. Master’s Thesis. Central China Normal University. https://kns.cnki.net/KCMS/detail/detail.aspx?dbname=CMFD201801&filename=1017252458.nh
[29]
L. Lu and D. Wu. 2022. Chinese-Tibetan bilingual ontology for traditional Tibetan festival. Library Development 1 (2022), 67–74. DOI:
[30]
Y. Chen. 2019. Research on Construction Method of Personal Knowledge Ontology in Pre-Qin Dynasty Based on CIDOC CRM. Master’s Thesis. Nanjing Agricultural University. https://kns.cnki.net/KCMS/detail/detail.aspx?dbname=CMFD202102&filename=1021049176.nh
[31]
Efthymia Moraitou, John Aliprantis, Yannis Christodoulou, and George Caridakis. 2019. Semantic bridging of cultural heritage disciplines and tasks. Heritage 2, 1 (2019), 611–630. DOI:
[32]
L. Gan, C. Wan, D. Liu, Q. Zhong, and T. Jiang. 2016. Chinese named entity relation extraction based on syntactic and semantic features. Journal of Computer Research and Development 2 (2016), 284–302.
[33]
T. Liu, W. Che, and Z. Li. 2011. Language technology platform. Journal of Chinese Information Processing 6 (2011), 53–62.
[34]
T. Mikolov, I. Sutskever, C. Kai, G. Corrado, and J. Dean. 2013. Distributed representations of words and phrases and their compositionality. arXiv:1310.4546 (2013).
[35]
Bahmani Bahman, Moseley Benjamin, Vattani Andrea, Kumar Ravi, and Vassilvitskii Sergei. 2012. Scalable k-means++. Proceedings of the VLDB Endowment 5, 7 (2012), 622–633. DOI:
[36]
S. Zhou, Z. Xu, and X. Tang. 2010. Method for determining optimal number of clusters in K-means clustering algorithm. Journal of Computer Applications 8 (2010), 1995–1998.

Index Terms

  1. A Joint Entity-Relation Detection and Generalization Method Based on Syntax and Semantics for Chinese Intangible Cultural Heritage Texts

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Journal on Computing and Cultural Heritage
      Journal on Computing and Cultural Heritage   Volume 17, Issue 1
      March 2024
      312 pages
      EISSN:1556-4711
      DOI:10.1145/3613493
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 13 January 2024
      Online AM: 02 November 2023
      Accepted: 24 July 2023
      Revised: 28 May 2023
      Received: 06 December 2022
      Published in JOCCH Volume 17, Issue 1

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Digital humanity
      2. intangible cultural heritage
      3. relation extraction
      4. Linked Data

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 286
        Total Downloads
      • Downloads (Last 12 months)215
      • Downloads (Last 6 weeks)17
      Reflects downloads up to 30 Jan 2025

      Other Metrics

      Citations

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media