Elsevier

Journal of Web Semantics

Volume 2, Issue 1, 1 December 2004, Pages 1-18
Journal of Web Semantics

Generating transformational annotation for web document adaptation: tool support and empirical evaluation

https://doi.org/10.1016/j.websem.2004.08.001Get rights and content

Abstract

Web annotation is crucial for providing machine-understandable descriptions of Web resources, and has a number of applications such as discovery, qualification, and adaptation of Web documents. While annotations are often embedded into a Web document, annotations can be associated externally by means of addressing expressions represented with the XPath language. However, creation of external annotation solely with a conventional editor is not easy because annotation authoring involves the maintenance and elaboration of addressing expressions as well as annotation contents. In addition, there has been little empirical study of robust pointing by XPath expressions, in spite of the increasing prevalence of the XPath language for use in emerging content adaptation systems. This paper proposes a classification of annotation tool design, taking account of differences in authoring methods and roles of annotation. On the basis of the classification, tools for generating external annotations are briefly explained along with applications of Web document adaptation for small-screen devices and portal site development. Robustness of the addressing expressions is then investigated, and practical implications to the reliable use of external annotation are drawn from empirical evaluation with evolving real-life Web documents.

Introduction

An annotation is a remark attached to a particular portion of a document, and covers a broad range in the literature. Forms of annotations can be characterized by the dimensions; whether formal or informal, and whether tacit or explicit [28]. Annotation that allows structural specification resides at the most formal and explicit extreme. Web annotation is crucial for providing not only human-readable remarks but also machine-understandable descriptions, and has a number of applications such as discovery, qualification, and adaptation of Web documents [23].

As more and more Web-enabled personal devices are becoming available for connecting to the Internet, the same Web documents need to be rendered differently on different client devices. Adaptation of Web document to delivery context is thus crucial for transparent Web access, which may depend on client capabilities, network connectivity, or user preferences [9]. The long-term goal of our research is to establish technologies of adapting Web documents suitable for delivery context. The document adaptation or customization requires annotation that indicates the ways of modifying the document at hand because Web documents are usually created without considering such adaptation, and are not provided with any additional information or hint for the adaptation.

Annotations can be embedded into a Web document as inline annotations, which are often created as extra attributes of document elements. Existing HTML browsers ignore unknown attributes added to HTML elements, without being bothered by the proprietary inline annotations. Because of this simplicity, inline annotation has been adopted as a way of associating annotation with HTML documents [12], [15], [27], [32]. An advantage of the inline approach is the ease of annotation maintenance without the bookkeeping task of associating annotations with their target document. The inline approach, however, requires annotation authors or annotators to have document ownership because annotated documents need to be modified whenever inline annotations are created or revised.

On the other hand, the external annotation approach [16], [17] does not suffer from these issues related to document ownership. In addition, clear distinction between content and annotation is desirable with regard to the design guideline that content should be separated from presentation. Therefore, it is assumed in this study that annotation is maintained separately from a target document, and exploited dynamically at runtime by a document adaptation engine. It is important to note that an external annotation may point to portions of different Web documents, if it makes sense to apply the annotation to documents with the same document fragment. External annotation approach thus provides a promising way of facilitating the sharing and reuse of metadata for Web document repurposing [19].

An external annotation consists of two items: annotation content and an addressing expression. In particular, the addressing expressions are represented with the open standard XPath language [36], which allows pointing to arbitrary nodes in the document object model (DOM) [10]. Besides the XPath language, there is another open standard addressing language called XPointer [37]. The XPointer language, which is an extension of the XPath, allows finer-grained pointing to substrings in character data and flexible pointing to multiple DOM-tree fragments. Regardless of the full-featured expressive power of the XPointer language, external annotations in this study are used for the node-level adaptation of Web documents. Therefore, we adopted the XPath language as a scheme of addressing expressions in this paper.

When annotations are attached simply as commentary to a target document, browser-based annotation tools are desirable even if the annotation is externally maintained. In the case of annotation for Web document adaptation, an addressing expression indicates a part of the document to be customized, and an annotation content specifies how the indicated portion should be modified. Therefore, creation of such annotation for document customization is not easy solely with a conventional editor for commentary annotations, and it is important to provide an advanced tool support for annotations for Web document adaptation.

In addition to the issues related to the tool support, robustness of the addressing expressions is crucial for the use of external annotation. Since Web documents may change over time, it is not always obvious what kinds of addressing expression keep pointing to the same target element regardless of the document changes. It was reported that a key complaint in the use of electronic annotation was the situation in which an annotation cannot point any portion of a target document [5]. This is an issue related to robust positioning, which has been investigated in a couple of empirical studies [31], [4]. However, there has been little study of robust pointing by XPath expressions, in spite of the increasing prevalence of the XPath language not only for use with XSLT [38], but also in emerging content adaptation systems [17], [34], [29], [3].

The objective of this study is to propose a classification of annotation tools that generate annotations for Web document adaptation, and to draw implications to the reliable use of external annotation on the basis of empirical evaluation with evolving real-life Web documents. In the next section, we clarify the different roles of annotations for assertion and transformation, and then introduce variations in annotation tool design with distinction of two types of authoring methods: annotation by selection and by example. Section 3 explains a page-clipping annotation language and tools for generating the clipping annotation, along with applications of the annotation to document clipping for small-screen devices and portal site development. In Section 4, we investigate the robustness of addressing expressions taking account of the changes in real-life Web documents. In particular, it was investigated to what extent the XPath expressions generated by the tools continued to point to the same nodes in the documents updated during the observation period of 1 year and 6 months. Finally, we discuss the advantages and limitations of the XPath expressions for practical use in external annotation.

Section snippets

Variations in annotation tool design

An annotation in general declares properties that qualify a particular portion of a target document. In some cases, however, annotations may indicate structural changes for the annotated portion of a target document. In order to clarify the distinction of these two roles, the former is called assertional annotation, while the latter transformational annotation [18]. Note that this distinction is not exclusive because every annotation is intrinsically an assertion.

It is simple for annotators to

Annotation tools for web page clipping

Web pages for e-commerce, for example, contain a lot of information such as product descriptions, product images, and numerous links to other areas of the site. Even if the pages were created for the desktop computers, it would be useful to deliver portions of those pages for users to access through a Web-enabled phone rather than a desktop browser. In such a case, the images and nested HTML tables prepared for a nicely laid out page are a hindrance rather than help. The sheer amount of

Robustness of addressing expressions

This section presents an empirical study that investigates the strength and limitation of a class of XPath expressions generated by the selection-based and example-based annotation tools explained in the previous section. The evaluation method here follows the procedure proposed in our preliminary work [2] on the evaluation of robust pointing by the XPath language. Although the previous study was done in a shorter period (120 days), the empirical study presented in this section was conducted

Concluding remarks

In this paper, we presented variations in annotation tool design, and explained the two types of tools that generate transformational annotation for Web document clipping. Transformational annotations are descriptions of the ways of modifying the document at hand, which can easily be indicated through annotator's editing actions to obtain the desired result of adaptation. Although the example-based annotation tool is the most sophisticated approach to creating transformational annotation, it

References (38)

  • M. Hori et al.

    Annotation-based Web content transcoding

  • T.A. Phelps et al.

    Robust intra-document locations

  • M. Abe, M. Hori, A visual approach to authoring XPath expressions, in: Proceedings of Extreme Markup Languages 2001,...
  • M. Abe et al.

    Robust pointing by XPath language: authoring support and empirical evaluation

  • C. Asakawa et al.

    Transcoding system for the non-visual Web access (2): annotation-based transcoding

  • A.J. Brush et al.

    Robust annotation positioning in digital documents

  • J.J. Cadiz et al.

    Using web annotations for asynchronous collaboration around documents

  • L. Denoue et al.

    An annotation tool for web browsers and its applications to information retrieval

  • S. DeWitt, Basic Web Clipping Using WebSphere Portal Version 4.1, IBM WebSphere Developer Domain....
  • Dublin Core Metadata Element Set, Version 1.1: Reference Description. Dublin Core Metadata Initiative, Recommendation....
  • Device Independence Principles, W3C Working Group Note, http://www.w3.org/TR/di-princ/,...
  • Document Object Model (DOM) Level 1 Specification Version 1.0, W3C Recommendation....
  • Eclipse. org Main page, Eclipse Foundation....
  • M. Erdmann et al.

    From manual to semi-automatic semantic annotation: about ontology-based text annotation tools

  • Everyplace Toolkit for WebSphere Studio. IBM Corp,...
  • S. Handschuh et al.

    Authoring and annotation of Web pages in CREAM

  • J. Heflin et al.

    Semantic interoperability on the Web

  • M. Hori, R. Mohan, H. Maruyama, S. Singhal, Annotation of Web Content for Transcoding. W3C Note,...
  • M. Hori et al.

    Annotation by transformation for the automatic generation of content customization metadata

  • Cited by (9)

    • Productisation: A review and research agenda

      2015, International Journal of Production Economics
    • Acquisition of knowledge to import existing traces into a trace-based management system

      2015, Proceedings of the 26th French Knowledge Engineering Conference, IC 2015
    • An example-based generator of XSLT programs

      2012, Innovations in XML Applications and Metadata Management: Advancing Technologies
    • Flexi-adaptor: A nobel approach for adapting web content for mobile devices

      2011, Communications in Computer and Information Science
    • A greedy approach for adapting web content for mobile devices

      2011, Communications in Computer and Information Science
    View all citing articles on Scopus
    View full text