Automatic generation of document semantics for the e-science Knowledge Grid

doi:10.1016/j.jss.2005.08.022

Journal of Systems and Software

Volume 79, Issue 7, July 2006, Pages 969-983

https://doi.org/10.1016/j.jss.2005.08.022 Get rights and content

Abstract

This paper proposes an approach to automatically generate semantics for scientific e-documents, and presents its applications in e-document understanding, question answering and question refinement. The approach uses not only keywords and their relations in e-documents, but also the implied meaning of co-occurred keywords that is hard to be exploited, represented and derived by previous semantic representation approaches. The proposed approach facilitates automatic construction, composition, decomposition and derivation of semantics at different granularity levels, which lay the basis for realizing intelligent services of the e-science Knowledge Grid.

Introduction

The Knowledge Grid is an intelligent interconnection environment that enables people or roles to effectively capture, publish, share and manage knowledge resources. It provides on-demand services to support innovation, cooperative teamwork, problem-solving and decision-making by adopting the technologies developed during work toward the future interconnection environment (Zhuge, 2004b). The e-science Knowledge Grid is an application where scientific documents need to be efficiently processed based on the understanding of content to effectively support scientific activities. The intelligent services of the Knowledge Grid require semantics to be automatically constructed, composed, decomposed and derived at different granularity levels.

Current Semantic Web approaches (Berners-Lee et al., 2001) can help but are not enough to meet these requirements. The XML-based RDF (Resource Description Framework) describes Web resources by using the object-attribute-value model. RDFS (RDF schema) expresses the metadata of Web resources by defining vocabulary, class-based structure, and constraints (Heflin and Hendler, 2001, W3C, xxxx). SHOE (frame-based Simple HTML Ontology Extensions) supports the Horn clause axioms. OIL (Ontology Inference Layer) supports the description logics and frame. OWL (Web Ontology Language, Smith et al., 2003) describes classes and their relations.

Fuzzy Cognitive Map (FCM) uses adjacency matrix to represent relational knowledge. The reasoning of FCM is realized by matrix operations (Liu and Satur, 1999, Noha and Lee, 2000, Kosko, 1997, Leea and Lee, 2003).

An active document framework (ADF) is a self-representable, self-explainable, and self-executable document mechanism (Zhuge, 2003). It represents document content in four aspects: granularity hierarchy, template hierarchy, background knowledge, and semantic links between fragments.

Based on the FCM and the idea of ADF, we propose an approach that generates document semantics by considering not only keywords and their relations, but also the implied meaning of co-occurred keywords in documents. Co-occurred keywords imply certain meaning. For example, if the keywords “terrorist”, “casualty”, “explode” and “panic” co-occur in the same section or paragraph, then the topic about “terror event” is likely to be discussed. The topic determined by multiple co-occurred keywords usually implies rich semantics, further, the meaning of keywords will be specified within the determined topic.

Section snippets

Fuzzy cognitive maps

The Fuzzy Cognitive Map (FCM) is a graphical model for causal knowledge representation. It can represent not only the causal relations between keywords or phrases but also the knowledge of different granularity levels. An FCM comprises concepts (nodes) and the relations between concepts (arrows). The mathematic model of FCM is as follows (Kosko, 1997): $V_{cj} (t + 1) = f (\sum_{\binom{i = 1}{i \neq j}}^{N} V_{ci} (t) w_{ij})$ where V_ci and V_cj are the state values of the cause concept and the effect concept respectively; w_ij is the weight of

Document understanding

Previous document understanding is the process of converting scanned document pages into an electronic and processable form (Altamura et al., 2000). Systems (Hobbs et al., 1988) emphasize on the utility of linguistic information. Our prototype includes an E-FCM repository (denoted as EFCM-R1) that contains eight E-FCMs illustrated in Figs. 2, Fig. 5, Fig. 6, Fig. 7, Fig. 8, Fig. 9, Fig. 16, Fig. 17. The keyword set K1 = {“FCM”, “intelligent”, “information”, “document understanding”, “vague

Conclusions

By defining semantic templates for e-documents of different granularities, this paper proposes an approach to automatically generate semantics for scientific e-documents. The approach uses document’s keywords, their relations, and the implied meaning of co-occurred keywords that are hard to be exploited and reasoned by previous semantic representation approaches. The semantics can be constructed, reasoned, composed and decomposed at different granularity levels according to requirement. So it

References (18)

H. Zhuge
Active E-document framework ADF: model and tool
Information and Management
(2003)
H. Zhuge
Retrieve images by understanding semantic links and clustering image fragments
Journal of Systems and Software
(2004)
O. Altamura et al.
Transforming paper documents into XML format with WISDOM++
International Journal of Document Analysis and Recognition
(2000)
T. Berners-Lee et al.
The semantic Web
Scientific American
(2001)
J. Heflin et al.
A portrait of the semantic Web in action
IEEE Intelligent Systems
(2001)
Hobbs, J.P., Sticket, M., et al., 1988. Interpretation as abduction. In: Proceedings of the 26th Annual Meeting of the...
B. Kosko
Fuzzy Engineering
(1997)
K.C. Leea et al.
A cognitive maps simulation approach to adjusting the design factors of the electronic commerce Web sites
Expert Systems with Applications
(2003)
Z.Q. Liu et al.
Contextual fuzzy cognitive maps for decision support in geographic information systems
IEEE Transactions on Fuzzy Systems
(1999)

There are more references available in the full text version of this article.

Cited by (42)

Building multi-subtopic Bi-level network for micro-blog hot topic based on feature Co-Occurrence and semantic community division
2020, Journal of Network and Computer Applications
Citation Excerpt :
Although this method is independent of the external knowledge base, it ignores the connection between feature words and knowledge base, and also affects the accuracy of feature words extraction. In recent years, the co-occurrence relationship of words has been studied deeply (Qu et al., 2018; Yu et al., 2015; Hai and Luo, 2006; Li et al., 2019a). This method takes into account the above two methods and has significant advantages.
The multi-subtopic is challenging to be understood timely and comprehensively due to micro-blog characteristics, such as low-value density, and fast update speed. For such an issue, this paper proposes a Multi-Subtopic Bi-level Network (MSBN) for micro-blog hot topics based on feature co-occurrence and semantic community division to support users understanding better the subject. First, the highlighted words are extracted by combining two coefficients including the micro-blog importance (e.g., the number of comments and the number of praises) and the time decay. The compound co-occurrence rates (i.e., global and local co-occurrence rates) are used to measure the correlation strength between any two highlighted words, while the global semantic of a micro-blog hot topic can be shown as a complex network whose nodes are the extracted feature words and edges are relations between any two feature words. Next, an improved weighted modularity function is proposed as a criterion for the community division. The complex network of a topic is divided into some semantic communities, where each is regarded as a subtopic of the given micro-blog topic. Subsequently, the genetic algorithm is used to calculate the maximum of weighted modularity and achieve community division of complex networks, so finally, the terminal location of each micro-blog in a different semantic community is obtained to draw regional location map and analyze the supporting propensity of each region to the micro-blog hot topic. Experimental results show that the proposed model can accurately and effectively represent the multi-subtopic of a micro-blog hot topic in the current time that supports users to discover and understand the micro-blog hot topic, allowing users to identify and understand the concerned differences among different regions for the same micro-blog hot topic.
A novel rule-centric object oriented approach for document generation
2014, Computers in Industry
Citation Excerpt :
Extraction of informative data from the PDF documents using pattern-matching techniques [14] is contrary to the intension of this proposal, which aims to generate documents using structured information and rules. Fuzzy cognitive mapping along with the keywords, their relations and co-occurrence of words to prepare a semantic template for scientific reasoning in e-documents is discussed in [15]. However, the resultant document can be used as query index for result retrieval only.
Complex business models in large-scale enterprises deal with voluminous knowledge based on which most decisive official and technical documents are generated. Nowadays, template processors are available for generating such documents. However, the existing template processors are either labor intensive or complicated to suit well-established business model and knowledge repositories in a heterogeneous environment. Hence, a novel generalized adaptable and flexible template processor that utilizes the existing resources without modifying the business model is proposed. The tacit business intelligence defined as rules, knowledge repositories and document structure are the nodal agents of this approach. Further, an XML based Object Query Definition Markup Language for rule definition is newly suggested. The rules are reorganized into hierarchical DAG structured rules using a transformation algorithm and traversed using hybrid traversal. The required output document is represented through a template. Object wrappers act as the communicating agent between diversified datasets and the templates. The proposed architecture is modeled and implemented using set theory. It is experimented in a web-based distributed environment using JAVA and tested using a real world dataset of a large-scale engineering enterprise. The results demonstrate its adaptability and extensibility to any multi-organizational structure.
A fuzzy cognitive map approach for effect-based operations: An illustrative case
2009, Information Sciences
This paper proposes the use of fuzzy cognitive maps (FCMs) as a technique for supporting the decision-making process in effect-based planning. The goal is to determine alternative courses of action to realize the aims of an operation, and choose the best option among them. With adequate consideration of the problem features and the constraints governing the method used, an FCM is developed to model effect-based operations (EBOs). In this study, certain features that do not exist in the classical FCM method were added to our FCM concept value calculation algorithm; these include influence possibility, influence duration, dynamic influence value-changing, and influence permanence. The model developed was applied to an illustrative scenario involving military planning, and we comment on the usefulness of the proposed methodology.
The contents and methods of knowledge network from the perspective of bibliometrics
2022, Technology Analysis and Strategic Management
The Implementation of a Personalized Reading System
2018, Proceedings - 2018 14th International Conference on Semantics, Knowledge and Grids, SKG 2018
Hierarchy-cutting model based association semantic for analyzing domain topic on the web
2017, IEEE Transactions on Industrial Informatics

View all citing articles on Scopus

Hai Zhuge is the chief scientist of the China Semantic Grid project funded by the National Basic Research Program of China. He is a professor and the director of the Key Lab of Intelligent Information Processing at the Institute of Computing Technology in Chinese Academy of Sciences, and the founder of the China Knowledge Grid Research Group (http://kg.ict.ac.cn), which owns over 30 young researchers. He presented over 10 keynotes at international conferences. He was the co-chair of the 2nd International Workshop on Knowledge Grid and Grid Intelligence, the program co-chair of the 4th International Conference on Grid and Cooperative Computing, and the co-chair of the 1st International Conference on Semantics, Knowledge and Grid. He organized several journal special issues on Knowledge Grid and Semantic Grid. He is serving as the Area Editor of the Journal of Systems and Software, the Associate Editor of Future Generation Computer Systems, the area editor of the Journal of Computer Science and Technology, and the editorial member of the Information and Management and the Electronic Commerce Research and Applications. His major research interest is the model, theory and methodology on the future interconnection environment. His monograph The Knowledge Grid is the first book in the area, and receives 2005’s Top Award of SONY Excellent Research. He is the author of over ninety papers appeared mainly in leading international journals such as Communications of the ACM; IEEE Computer; IEEE Transactions on Knowledge and Data Engineering; IEEE Intelligent Systems; IEEE Computing in Science and Engineering; and IEEE Transactions on Systems, Man, and Cybernetics. One of them was among the Top 1% highly cited papers in the area according to ISI Essential Science Indicator. He is a senior member of the IEEE and a member of the ACM. He was among the Top scholars in software engineering and systems area (1999–2003) according to the assessment report published in the Journal of Systems and Software.

Xiangfeng Luo is a postdoctor of China Knowledge Grid Research Group at Institute of Computing Technology in Chinese Academy of Sciences. His research fields include knowledge capturing, Semantic and Knowledge Grid, artificial intelligence and pattern recognition. He is in charge of a research project supported by National Science Foundation of China.

^☆: Research work is supported by the National Basic Research Program of China (973 project no. 2003CB317000) and the National Science Foundation of China (grants 60273020, 60402016 and 70271007).

View full text

Automatic generation of document semantics for the e-science Knowledge Grid☆

Abstract

Introduction

Section snippets

Fuzzy cognitive maps

Document understanding

Conclusions

Information and Management

Journal of Systems and Software

Transforming paper documents into XML format with WISDOM++

International Journal of Document Analysis and Recognition

The semantic Web

Scientific American

A portrait of the semantic Web in action

IEEE Intelligent Systems

Fuzzy Engineering

A cognitive maps simulation approach to adjusting the design factors of the electronic commerce Web sites

Expert Systems with Applications

Contextual fuzzy cognitive maps for decision support in geographic information systems

IEEE Transactions on Fuzzy Systems