An intelligent query processing for distributed ontologies

https://doi.org/10.1016/j.jss.2009.06.008Get rights and content

Abstract

In this paper, we propose an intelligent distributed query processing method considering the characteristics of a distributed ontology environment. We suggest more general models of the distributed ontology query and the semantic mapping among distributed ontologies compared with the previous works. Our approach rewrites a distributed ontology query into multiple distributed ontology queries using the semantic mapping, and we can obtain the integrated answer through the execution of these queries. Furthermore, we propose a distributed ontology query processing algorithm with several query optimization techniques: pruning rules to remove unnecessary queries, a cost model considering site load balancing and caching, and a heuristic strategy for scheduling plans to be executed at a local site. Finally, experimental results show that our optimization techniques are effective to reduce the response time.

Introduction

In the Semantic Web, the definitions of resources and the relationship between resources are described by an ontology in order to automatically interpret the resources and retrieve useful information. The resources in the Web are independently generated in many locations. Thus, even if the ontologies describe resources in the same (similar) domain, they can use different representations (i.e., language and schema). Also, the ontologies are managed by various local ontology management systems which have different capabilities and strategies for storing and query processing. Under these environment, some Web applications want to access the ontologies without regard to the heterogeneity and the dispersion of the ontologies and the local systems. In order to support such a request, an efficient query processing over the distributed ontologies is essential. Of course, existing distributed query processing techniques can be applied to query the distributed ontologies. However, they confront the limitations of the efficiency and the functionality since some important characteristics of a distributed ontology environment are not considered.

Fig. 1 shows an example of the distributed ontology environment. There are three kinds of ontologies, UNIV, COLLEGE, and PUB which are managed in three different sites and two types of local systems (i.e., LS1,LS2). UNIV and COLLEGE describe the information of the university and the college, respectively, and PUB describes the publication information. For the simplicity, we describe only the schema and omit the instance part. These ontologies are independently generated but related to each other even if they have different schemas. For example, let us suppose the following conditions: first, the concept of Professor in UNIV is defined as the concept of Lecturer in COLLEGE. Second, the information of the authors in PUB can be found in UNIV and COLLEGE. In this distributed ontology environment, consider the following example queries:

Example 1

Q1: Find professors who teach ‘Algorithm’.

Example 2

Q2: Find authors who wrote publications about ‘Semantic Web’ and also retrieve the name and the email addresses of the authors.

In order to find the answer of query Q1, we should retrieve professors and lecturers who teach ‘Algorithm’ from UNIV and COLLEGE, respectively. For query Q2, UNIV and COLLEGE should be searched along with PUB to find the personal information of the authors who wrote papers about ‘Semantic Web’. For such a query, in order to efficiently find the answer dispersed in several ontologies and local sites, a distributed query processing method considering the heterogeneity of the ontologies is required.

The use of the semantic mapping is a representative approach to deal with the heterogeneity among different ontologies (Borgida and Serafini, 2003, Haase and Motik, 2005, Motik et al., 2004, Serafini and Andrei, 2005, Borgida and Serafini, 2003, Serafini and Andrei, 2005), the semantic mapping is the semantic relationship (i.e., subsumption or equivalence) between concepts (i.e., classes or properties) in two different ontologies and it has been extended to that between views (i.e., queries) (Haase and Motik, 2005, Motik et al., 2004). However, the previous works do not support more general semantic mapping and distributed query covering more than two ontologies. Besides, most of them have focused on only the rewriting of the query using the semantic mapping, and do not make an issue of the efficient distributed query evaluation (i.e., query rewriting, scheduling, and execution).

In this paper, we resolve issues of the distributed query processing over multiple heterogeneous ontologies. We extend the models of the distributed query and the semantic mapping to support more general distributed ontology query answering compared with previous works. Furthermore, we present a distributed ontology query processing algorithm with several query optimization techniques considering the characteristics of the distributed ontology environment.

The contributions of the paper are as follows:

Extended models of the distributed ontology query and the semantic mapping: We present a general distributed ontology query model to cover multiple different ontologies. We also present a general semantic mapping model in which more than two ontologies can be associated. The extension of query and semantic mapping models makes it possible to include relevant data which could not be accessed before in the query result. Also, our approach logically integrates independently grown distributed ontologies through the query rewriting based on the semantic mapping. As a result, we can efficiently extract an integrated answer of a distributed query over different ontologies.

Optimization techniques for an efficient query processing on the distributed ontologies: Multiple distributed queries are generated from an original distributed query to obtain results from dispersed ontologies. In order to remove unnecessary operations and to increase the parallelism among executions of the multiple queries, we suggest several optimization techniques. First, we present pruning rules to remove invalid and redundant queries. Second, we suggest a heuristic strategy for scheduling plans to be executed at a local site. Third, we propose a cost model considering site load balancing and caching for processing multiple distributed queries.

The remainder of the paper is organized as follows: In Section 2, we review related work. In Section 3, we present a distributed ontology query model and a semantic mapping model. Section 4 describes a distributed query processing technique with several query optimization techniques over distributed ontologies. Section 5 contains the results of experiments. Finally, in Section 6, we conclude this paper.

Section snippets

Related work

Recently, the research on a query processing over distributed ontologies has been performed. Stuckenschmidt et al. (2005) suggests a global data summary for locating data matching query answers in different sources and the query optimization. However, Stuckenschmidt et al. (2005) assumes that all distributed ontologies can be accessed in a uniform way like a global schema. In other words, the heterogeneity of schemas of the distributed ontologies is not considered. Besides, many tasks are

Preliminary

The ontology describes the definitions of resources and the semantic relationships among the resources. An ontology consists of a schema and instances. The schema defines concepts (i.e., class and property) and relationships between the concepts. In the instance part, the type (i.e., class) of a resource and the relationship (i.e., properties) between resources are declared according to the schema. The ontology is expressed in triples describing the relationships between concepts, between

Distributed ontology query processing

The schemas of multiple ontologies are provided to the user who wants to query over the ontololgies, and the user makes distributed ontology queries covering the ontologies. We assume that the schemas are provided in the same representation format and the user is capable of understanding the content of each schema.

A distributed ontology query can be rewritten to multiple ontology queries according to semantic mappings. The answer of the original user query will be covered by the distributed

Experimental analysis

The previous studies on the query processing over distributed ontologies have focused on only the query rewriting for the distributed query processing, but not on the efficiency of the query evaluation. Thus, in order to show the efficiency of our query processing technique, we implemented AIDOS (An Intelligent Distributed Ontology query proceSsing), and empirically compared the performance of the distributed query processing of two versions of AIDOS; AIDOS-I and AIDOS-II. AIDOS-I prunes

Conclusion

In this paper, we introduce an intelligent distributed ontology query processing method. We suggest more general models of the distributed onotology query and the semantic mapping among distributed ontologies than those of previous works so as to be applicable to more general environments and to retrieve the richer query answer. Also, through the query rewriting using the semantic mapping, we can obtain the integrated answer of a query over distributed ontologies without any global schema and

Acknowledgement

This research was supported in part by the Ministry of Knowledge Economy, Korea, under the Information Technology Research Center support program supervised by the Institute of Information Technology Advancement. (grant number IITA-2008-C1090-0801-0031), and in part by Microsoft Research Asia.

Jihyun Lee is a Ph.D student in the Division of Computer Science at Korea Advanced Instituted of Science and Technology (KAIST), South Korea. Her research interests include XML data management, semantic Web, ontology data management, and information retrieval on the Web.

References (20)

  • P. Adjiman et al.

    Somerdfs in the semantic web

    J. Data Semantics VIII

    (2007)
  • A. Borgida et al.

    Distributed description logics: assimilating information from peer sources

    J. Data Semantics

    (2003)
  • Calvanese, D., Giacomo, G.D., Lenzerini, M., 1998. On the decidability of query containment under constraints. In:...
  • D. Calvanese et al.

    Conjunctive query containment and answering under description logic constraints

    ACM Trans. Comput. Logic

    (2008)
  • Guo, Y., Pan, Z., Heflin, J., 2004. An evaluation of knowledge base systems for large OWL datasets. In: Proc. of ISWC,...
  • Haase, P., Motik, B., 2005. A mapping system for the integration of owl-dl ontologies. In: Proc. of IHIS 2005, pp....
  • Haase, P., Wang, Y., 2007. A decentralized infrastructure for query answering over distributed ontologies. In: Proc. of...
  • A.Y. Halevy

    Answering queries using views: a survey

    VLDB J.

    (2001)
  • Halevy, A.Y., Ivesc, Z.G., Mork, P., Tatarinov, I., 2003. Piazza: data management infrastructure for semantic web...
  • D. Kossmann

    The state of the art in distributed query processing

    ACM Comput. Survey

    (2000)
There are more references available in the full text version of this article.

Cited by (17)

  • MostoDE: A tool to exchange data amongst semantic-web ontologies

    2013, Journal of Systems and Software
    Citation Excerpt :

    They are represented in a high-level, structured language (Rivero et al., 2011a). Note that, in addition to data exchange, executable mappings are the cornerstone components of several other integration tasks, such as data integration (Makris et al., 2010, 2012; Lenzerini, 2002), model matching (Bellahsene et al., 2011), model evolution (Noy and Klein, 2004), or query processing in distributed ontologies (Lee et al., 2010). Ad-hoc proposals are difficult to create, tune, maintain, and reuse since they require an expert to handcraft a piece of software to solve each data exchange problem independently from the others (Popa et al., 2002).

  • Reasoning and change management in modular fuzzy ontologies

    2011, Expert Systems with Applications
    Citation Excerpt :

    It is well-known that there are many reasons for thinking about ontology modularization (Stuckenschmidt & Klein, 2007). For example, in distributed environments such as Semantic Web, ontologies in different places are built independent of each other and can be assumed to be highly heterogeneous (Lee, Park, Park, Chung, & Min, 2010; Stuckenschmidt, Parent, & Spaccapietra, 2009). Unrestricted referencing of concepts in a remote ontology can therefore lead to serious semantic problems as the domain of interpretation may differ even if concepts appear to be the same on a conceptual level.

  • Intelligent fuzzy information retrieval based on ontology knowledge-base

    2018, International Journal of Internet Protocol Technology
View all citing articles on Scopus

Jihyun Lee is a Ph.D student in the Division of Computer Science at Korea Advanced Instituted of Science and Technology (KAIST), South Korea. Her research interests include XML data management, semantic Web, ontology data management, and information retrieval on the Web.

Jeong-Hoon Parkis a Ph.D student in the Division of Computer Science at Korea Advanced Instituted of Science and Technology (KAIST), South Korea. His research interests include semantic Web, semantic annotation, ontology data management, and information retrieval on the Web.

Myung-Jae Parkis a Ph.D. student in the Division of Computer Science at the Korea Advanced Institute of Science and Technology (KAIST), Korea. His research interests include XML, ontology and the semantic Web, and publish/subscribe systems.

Chin-Wan Chung received a Ph.D. degree from the University of Michigan, Ann Arbor in 1983. He was a Senior Research Scientist and a Staff Research Scientist in the Computer Science Department at the General Motors Research Laboratories (GMR). While at GMR, he developed Dataplex, a heterogeneous distributed database management system integrating different types of databases. Since 1993, he has been a professor in the Division of Computer Science at the Korea Advanced Institute of Science and Technology (KAIST), Korea. At KAIST, he developed a full-scale object-oriented spatial database management system called OMEGA, which supports ODMG standards. His current research interests include the semantic Web, the mobile Web, sensor networks and stream data management, and multimedia databases.

Jun-Ki Min is a professor in the school of Internet-Media at the Korea University of Technology and Education (KUT) in Korea. He received a Ph.D. degree from the Korea Advanced Institute of Science and Technology (KAIST), Korea in 2002. He was a senior researcher in Electronics and Telecommunications Research Institute (ETRI), Korea. While at ETRI, he developed UbiCore, which is a large volume stream data management system. He has written and published several articles in international journals and conference proceedings. His current research interests include XML, the semantic Web, sensor network and stream data management.

View full text