Elsevier

Computer Networks

Volume 42, Issue 5, 5 August 2003, Pages 579-598
Computer Networks

CREAM: CREAting Metadata for the Semantic Web

https://doi.org/10.1016/S1389-1286(03)00226-3Get rights and content

Abstract

Richly interlinked, machine-understandable data constitute the basis for the Semantic Web. We provide a framework, CREAM, that allows for creation of metadata. While the annotation mode of CREAM allows creation of metadata for existing Web pages, the authoring mode lets authors create metadata––almost for free––while putting together the content of a page.

As a feature of our framework, CREAM allows creating relational metadata, i.e., metadata that instantiate interrelated definitions of classes in a domain ontology rather than a comparatively rigid template-like schema such as Dublin Core. We discuss some of the requirements one has to meet when developing such an ontology-based framework, e.g., the integration of a metadata crawler, inference services, document management and a meta-ontology, and describe its implementation, viz. OntoMat, a component-based, ontology-driven Web-page authoring and annotation tool.

Introduction

The Semantic Web builds on metadata describing the contents of Web pages. In particular, the Semantic Web requires relational metadata, i.e., metadata that describe how resource descriptions instantiate class definitions and how they are semantically interlinked by properties. We have carried through several case studies that build on this idea of the Semantic Web in order to provide intelligent applications to make knowledge about researchers, about companies and markets, and about research papers accessible by semantic means. An important cornerstone of the case studies––and many other scenarios in the Semantic Web––is a framework and a mechanism that let users easily and comprehensively contribute relational metadata that the Semantic Web builds upon. For this objective, we have developed our semantic annotation framework, CREAM––CREAting Metadata for the Semantic Web––that we present here.1 CREAM is geared to allow for the easy and comfortable creation of semantic metadata.

CREAM allows for the a posteriori annotation of existing resources. A posteriori creation of metadata involves the consideration of an item (e.g., a document) by an agent (possibly, but not necessarily, a human agent) and its description with metadata. For this purpose we distinguish two modes of interaction: (i) the process may include the identification of elements already existing within the item. For instance, one may consider an HTML page and identify its author by a footer appearing within the page; (ii) one may classify the item as belonging to a category like Business though neither the term itself nor a synonym or hyponym appears in the item.

A posteriori creation of metadata comes with a major drawback. In order to provide metadata about the contents of a Web page, the author must first create the content and second annotate the content in the additional, a-posteriori, annotation step. To avoid the work overhead, we propose a third mode of interaction, (iii) an author has the possibility to easily combine authoring of a Web page and the creation of relational metadata describing its content.

Scrutinizing the differences between the modes of interaction (i), (ii), and (iii), we found that the major problems one must deal with for an annotation framework are identical. In fact, we found it preferable to hide the border between annotation (i.e., (i) and (ii)) and authoring (i.e., (iii)) as far as possible within CREAM. Some questions, however, were triggered by the need to distinguish the ontology that defines the structure of the relational metadata from its use for annotation or authoring. For this purpose, we introduce here a meta-ontology that describes how the annotation and authoring modes of OntoMat interfere with classes and properties of the ontology proper. We modularize the ontology parts needed in the metadata creation process from the ones relevant for the targeted content description.

For the CREAM framework we have also provided a reference implementation, viz. OntoMat-Annotizer (OntoMat for short), which is freely available for download.2

In the following we first describe the case studies from which we took a major part of our experiences for guiding the development of CREAM (Section 2). Then, we describe some of the requirements in detail that were derived from the case studies (Section 3). We explain our terminology in more detail and give an example of the metadata we want to create in Section 4. We derive in Section 5 the design of CREAM from the requirements elaborated before. In Section 6, we specify how the meta ontology may modularize the ontology description from the way the ontology is used in CREAM. In Section 7, we explain the major modes of interaction with OntoMat, our implementation of CREAM. Before we conclude, we give a survey of related work in the areas knowledge markup on the Web, knowledge acquisition, annotation environments and authoring environments.

Section snippets

Case studies for CREAM

Below, we describe three case studies that we have performed and that have guided our development of CREAM. The case studies have in common that they require the generation of metadata given an HTML document from which a human could identify relevant metadata entities.

During the studies, we repeatedly encountered several principal problems. Some of the problems were mostly syntactic, viz. people would easily make syntactic errors, e.g., closing XML parentheses incorrectly or not at all. In the

Requirements for CREAM

Given the problems with syntax, semantics and pragmatics experienced in the case studies described above, we can now list a more precise set of requirements. Thereby, the principal requirements apply for a-posteriori annotation as well as for the integration of Web page authoring with metadata creation as follows:

  • Consistency. Semantic structures should adhere to a given ontology in order to allow for better sharing of knowledge. For example, it should be avoided that people use an attribute,

Relational metadata

We elaborate the terminology we use in our framework, because many of the terms that are used with regard to metadata creation tools carry several, ambiguous connotations that imply conceptually important decisions for the design rationale of CREAM.

  • Ontology. An ontology is a formal, explicit specification of a shared conceptualization of a domain of interest [19]. In our case, an ontology is defined in RDF(S) or DAML + OIL. Hence, an ontology is constituted by statements expressing definitions of

CREAM modules

The requirements and considerations from 1 Introduction, 2 Case studies for CREAM, 3 Requirements for CREAM, 4 Relational metadata feed into the design rationale of CREAM. The design rationale links the requirements with the CREAM modules. This results in a N:M mapping (neither functional nor injective). An overview of the matrix is given in Table 1.

  • Document editor. The document editor may be conceptually––though not practically––distinguished into a viewing component and the component for

Meta-ontology

A meta-ontology is needed to describe how classes, attributes and relationships from the domain ontology should be used by the CREAM environment. Hence, it describes what role the properties in the domain ontology have in the annotation system. In particular, we have recognized the urgent need for the meta-ontology characterizations elaborated in 6.1 Label, 6.2 Default pointing, 6.3 Property mode.

The meta-ontology is given with the annotation system. The annotator must establish the connection

Modes of interaction with OntoMat

The metadata creation process in OntoMat is actually supported by three types of interaction with the tool (also cf. Fig. 2):

  • 1.

    Annotation by typing statements. This involves working almost exclusively within the ontology guidance/fact browser.

  • 2.

    Annotation by markup. This mostly involves the reuse of data from the document editor/viewer in the ontology guidance/fact browser.

  • 3.

    Annotation by authoring Web pages. This mostly involves the reuse of data from the fact browser in the document editor.


In

Comparison with related work

CREAM can be compared along four dimensions: First, it is a framework for markup in the Semantic Web. Second, it may be considered as a particular knowledge acquisition framework that is to some extent similar to Protégé-2000 [13]. Third, it is certainly an annotation framework, though with a different focus than ones like Annotea [28]. And fourth it is an authoring framework with emphasis on metadata creation.

Conclusion

CREAM is a comprehensive framework for creating semantic metadata, relational metadata in particular––the foundation of the future Semantic Web. CREAM builds on comprehensive experience we have collected in several case studies on creating metadata for the Semantic Web. CREAM supports three modes of interaction for a posteriori annotation and for creating metadata while authoring a Web page. In order to avoid problems with syntax, semantics and pragmatics, CREAM employs a rich set of modules

Acknowledgements

The research presented in this paper has profited from discussions with our colleagues at University of Karlsruhe, Stanford University and Ontoprise GmbH. In particular, we want to thank Stefan Decker (now: Information Science Institute, USC), Alexander Maedche (now: FZI Research Center for Information Technologies), Mika Maier-Collin (Ontoprise), Tanja Sollazzo (now: Siemens) and Sichun Xu (Stanford University). We also thank the reviewers for their comments, which helped to improve this paper.

References (49)

  • N. Kushmerick

    Wrapper induction: efficiency and expressiveness

    Artificial Intelligence

    (2000)
  • P. Martin et al.

    Embedding knowledge in Web documents

  • T. Phelps et al.

    Robust intra-document locations

    Computer Networks

    (2000)
  • S. Staab et al.

    Semantic community Web portals

    Computer Networks

    (2000)
  • S. Bechhofer, L. Carr, C. Goble, S. Kampa, T. Miles-Board, The semantics of semantic annotation, in: ODBASE: First...
  • R. Benjamins et al.

    KA2: building ontologies for the internet: a midterm report

    International Journal of Human Computer Studies

    (1999)
  • D. Brickley, R.V. Guha, RDF vocabulary description language 1.0: RDF schema. Technical Report, W3C, 2002, W3C Working...
  • J. Broekstra et al.

    Enabling knowledge representation on the Web by extending RDF schema

  • S. Decker, Semantic Web methods for knowledgement management, Ph.D. Thesis, University of Karlsruhe,...
  • S. Decker, D. Brickley, J. Saarela, J. Angele, A query and inference service for RDF, in: Proceedings of the W3C Query...
  • S. Decker et al.

    Ontobroker: ontology based access to distributed and semi-structured information

  • L. Denoue, L. Vignollet, An annotation tool for Web browsers and its applications to information retrieval, in:...
  • S. DeRose, E. Maler, R. Daniel, XML pointer language (XPointer), Technical Report, W3C, 2001, Working Draft 16 August...
  • Dublin Core Metadata Initiative, April 2001, Available from...
  • Dublin Core Metadata Template, 2001, Available from...
  • M. Erdmann, A. Maedche, H.-P. Schnurr, S. Staab, From manual to semi-automatic semantic annotation: about...
  • H. Eriksson, R. Fergerson, Y. Shahar, M. Musen, Automatic generation of ontology editors, in: Proceedings of the 12th...
  • D. Fensel, J. Angele, S. Decker, M. Erdmann, H.-P. Schnurr, S. Staab, R. Studer, A. Witt, On2broker: semantic-based...
  • C. Fillies, F. Weichhardt, Graphische Entwicklung und Nutzung von Ontologien mit SemTalk in MS Office, in: R....
  • M. Frank, P. Szekely, R. Neches, B. Yan, J. Lopez, Web-scripter: world-wide grass-roots ontology translation via...
  • Reference description of the DAML+OIL (March 2001) ontology markup language, March 2001, Available from...
  • C. Goble, S. Bechhofer, L. Carr, D. De Roure, W. Hall, Conceptual open hypermedia=the semantic Web? in: S. Staab, S....
  • T.R. Gruber

    A translation approach to portable ontology specifications

    Knowledge Acquisition

    (1993)
  • S. Handschuh et al.

    Authoring and annotation of Web pages in CREAM

  • Cited by (0)

    View full text