Automatic generation of document semantics for the e-science Knowledge Grid

https://doi.org/10.1016/j.jss.2005.08.022Get rights and content

Abstract

This paper proposes an approach to automatically generate semantics for scientific e-documents, and presents its applications in e-document understanding, question answering and question refinement. The approach uses not only keywords and their relations in e-documents, but also the implied meaning of co-occurred keywords that is hard to be exploited, represented and derived by previous semantic representation approaches. The proposed approach facilitates automatic construction, composition, decomposition and derivation of semantics at different granularity levels, which lay the basis for realizing intelligent services of the e-science Knowledge Grid.

Introduction

The Knowledge Grid is an intelligent interconnection environment that enables people or roles to effectively capture, publish, share and manage knowledge resources. It provides on-demand services to support innovation, cooperative teamwork, problem-solving and decision-making by adopting the technologies developed during work toward the future interconnection environment (Zhuge, 2004b). The e-science Knowledge Grid is an application where scientific documents need to be efficiently processed based on the understanding of content to effectively support scientific activities. The intelligent services of the Knowledge Grid require semantics to be automatically constructed, composed, decomposed and derived at different granularity levels.

Current Semantic Web approaches (Berners-Lee et al., 2001) can help but are not enough to meet these requirements. The XML-based RDF (Resource Description Framework) describes Web resources by using the object-attribute-value model. RDFS (RDF schema) expresses the metadata of Web resources by defining vocabulary, class-based structure, and constraints (Heflin and Hendler, 2001, W3C, xxxx). SHOE (frame-based Simple HTML Ontology Extensions) supports the Horn clause axioms. OIL (Ontology Inference Layer) supports the description logics and frame. OWL (Web Ontology Language, Smith et al., 2003) describes classes and their relations.

Fuzzy Cognitive Map (FCM) uses adjacency matrix to represent relational knowledge. The reasoning of FCM is realized by matrix operations (Liu and Satur, 1999, Noha and Lee, 2000, Kosko, 1997, Leea and Lee, 2003).

An active document framework (ADF) is a self-representable, self-explainable, and self-executable document mechanism (Zhuge, 2003). It represents document content in four aspects: granularity hierarchy, template hierarchy, background knowledge, and semantic links between fragments.

Based on the FCM and the idea of ADF, we propose an approach that generates document semantics by considering not only keywords and their relations, but also the implied meaning of co-occurred keywords in documents. Co-occurred keywords imply certain meaning. For example, if the keywords “terrorist”, “casualty”, “explode” and “panic” co-occur in the same section or paragraph, then the topic about “terror event” is likely to be discussed. The topic determined by multiple co-occurred keywords usually implies rich semantics, further, the meaning of keywords will be specified within the determined topic.

Section snippets

Fuzzy cognitive maps

The Fuzzy Cognitive Map (FCM) is a graphical model for causal knowledge representation. It can represent not only the causal relations between keywords or phrases but also the knowledge of different granularity levels. An FCM comprises concepts (nodes) and the relations between concepts (arrows). The mathematic model of FCM is as follows (Kosko, 1997):Vcj(t+1)=fi=1ijNVci(t)wijwhere Vci and Vcj are the state values of the cause concept and the effect concept respectively; wij is the weight of

Document understanding

Previous document understanding is the process of converting scanned document pages into an electronic and processable form (Altamura et al., 2000). Systems (Hobbs et al., 1988) emphasize on the utility of linguistic information. Our prototype includes an E-FCM repository (denoted as EFCM-R1) that contains eight E-FCMs illustrated in Figs. 2, Fig. 5, Fig. 6, Fig. 7, Fig. 8, Fig. 9, Fig. 16, Fig. 17. The keyword set K1 = {“FCM”, “intelligent”, “information”, “document understanding”, “vague

Conclusions

By defining semantic templates for e-documents of different granularities, this paper proposes an approach to automatically generate semantics for scientific e-documents. The approach uses document’s keywords, their relations, and the implied meaning of co-occurred keywords that are hard to be exploited and reasoned by previous semantic representation approaches. The semantics can be constructed, reasoned, composed and decomposed at different granularity levels according to requirement. So it

Hai Zhuge is the chief scientist of the China Semantic Grid project funded by the National Basic Research Program of China. He is a professor and the director of the Key Lab of Intelligent Information Processing at the Institute of Computing Technology in Chinese Academy of Sciences, and the founder of the China Knowledge Grid Research Group (http://kg.ict.ac.cn), which owns over 30 young researchers. He presented over 10 keynotes at international conferences. He was the co-chair of the 2nd

References (18)

  • H. Zhuge

    Active E-document framework ADF: model and tool

    Information and Management

    (2003)
  • H. Zhuge

    Retrieve images by understanding semantic links and clustering image fragments

    Journal of Systems and Software

    (2004)
  • O. Altamura et al.

    Transforming paper documents into XML format with WISDOM++

    International Journal of Document Analysis and Recognition

    (2000)
  • T. Berners-Lee et al.

    The semantic Web

    Scientific American

    (2001)
  • J. Heflin et al.

    A portrait of the semantic Web in action

    IEEE Intelligent Systems

    (2001)
  • Hobbs, J.P., Sticket, M., et al., 1988. Interpretation as abduction. In: Proceedings of the 26th Annual Meeting of the...
  • B. Kosko

    Fuzzy Engineering

    (1997)
  • K.C. Leea et al.

    A cognitive maps simulation approach to adjusting the design factors of the electronic commerce Web sites

    Expert Systems with Applications

    (2003)
  • Z.Q. Liu et al.

    Contextual fuzzy cognitive maps for decision support in geographic information systems

    IEEE Transactions on Fuzzy Systems

    (1999)
There are more references available in the full text version of this article.

Cited by (42)

  • Building multi-subtopic Bi-level network for micro-blog hot topic based on feature Co-Occurrence and semantic community division

    2020, Journal of Network and Computer Applications
    Citation Excerpt :

    Although this method is independent of the external knowledge base, it ignores the connection between feature words and knowledge base, and also affects the accuracy of feature words extraction. In recent years, the co-occurrence relationship of words has been studied deeply (Qu et al., 2018; Yu et al., 2015; Hai and Luo, 2006; Li et al., 2019a). This method takes into account the above two methods and has significant advantages.

  • A novel rule-centric object oriented approach for document generation

    2014, Computers in Industry
    Citation Excerpt :

    Extraction of informative data from the PDF documents using pattern-matching techniques [14] is contrary to the intension of this proposal, which aims to generate documents using structured information and rules. Fuzzy cognitive mapping along with the keywords, their relations and co-occurrence of words to prepare a semantic template for scientific reasoning in e-documents is discussed in [15]. However, the resultant document can be used as query index for result retrieval only.

  • The Implementation of a Personalized Reading System

    2018, Proceedings - 2018 14th International Conference on Semantics, Knowledge and Grids, SKG 2018
View all citing articles on Scopus

Hai Zhuge is the chief scientist of the China Semantic Grid project funded by the National Basic Research Program of China. He is a professor and the director of the Key Lab of Intelligent Information Processing at the Institute of Computing Technology in Chinese Academy of Sciences, and the founder of the China Knowledge Grid Research Group (http://kg.ict.ac.cn), which owns over 30 young researchers. He presented over 10 keynotes at international conferences. He was the co-chair of the 2nd International Workshop on Knowledge Grid and Grid Intelligence, the program co-chair of the 4th International Conference on Grid and Cooperative Computing, and the co-chair of the 1st International Conference on Semantics, Knowledge and Grid. He organized several journal special issues on Knowledge Grid and Semantic Grid. He is serving as the Area Editor of the Journal of Systems and Software, the Associate Editor of Future Generation Computer Systems, the area editor of the Journal of Computer Science and Technology, and the editorial member of the Information and Management and the Electronic Commerce Research and Applications. His major research interest is the model, theory and methodology on the future interconnection environment. His monograph The Knowledge Grid is the first book in the area, and receives 2005’s Top Award of SONY Excellent Research. He is the author of over ninety papers appeared mainly in leading international journals such as Communications of the ACM; IEEE Computer; IEEE Transactions on Knowledge and Data Engineering; IEEE Intelligent Systems; IEEE Computing in Science and Engineering; and IEEE Transactions on Systems, Man, and Cybernetics. One of them was among the Top 1% highly cited papers in the area according to ISI Essential Science Indicator. He is a senior member of the IEEE and a member of the ACM. He was among the Top scholars in software engineering and systems area (1999–2003) according to the assessment report published in the Journal of Systems and Software.

Xiangfeng Luo is a postdoctor of China Knowledge Grid Research Group at Institute of Computing Technology in Chinese Academy of Sciences. His research fields include knowledge capturing, Semantic and Knowledge Grid, artificial intelligence and pattern recognition. He is in charge of a research project supported by National Science Foundation of China.

Research work is supported by the National Basic Research Program of China (973 project no. 2003CB317000) and the National Science Foundation of China (grants 60273020, 60402016 and 70271007).

View full text