Distributed ontology building as practical work

https://doi.org/10.1016/j.ijhcs.2010.12.011Get rights and content

Abstract

Ontologies – a form of structured and logically related knowledge or classification hierarchy embedded in a computer system – are regarded by many scientists as having enormous promise for the consistent use and re-use of data. To realise this promise, however, is not straightforward. In this paper, based on ethnographic observation, we argue that the challenges for ontology building are ‘social’ as much as they are technical. By this we mean the routine work undertaken in the building process and the problems and difficulties entailed can be understood in terms of the practices of knowledge workers and the practical nature of ‘sorting things out’. Getting a better sense of how, in practise, this work gets done gives a sense of the main challenges of building successful ontologies and how this impacts on the design of tool support. In considering the practices of one group in particular, we try to show how, for members, the technical problems of determining what classification structure is appropriate, and what its boundaries might be, depend substantially on assumptions about the ‘community’ and its interests and purposes. This ‘turn to the social’ has ramifications for the understanding of ontology building and use. Specifically, ‘modelling’ approaches to ontology building tell us little about the practical organisation of the work and how this relates to the prospect of successful sharing. Ethnographic enquiry may reveal important issues that are otherwise missed.

Research highlights

► An ethnographic study of bio-informaticians engaged in ontology-building work. ► Close observation of the way in which a small group of experts engage in dealing with the practical issues that confront them and the procedures they adopt as they attempt to (re)build an ontology. ► Provides a comparison with ‘top-down’ and prescriptive ontology-building methodologies. ► Evaluates the use of technical and other resources necessary for the work to be done.

Introduction

Ontologies provide a means to formalise knowledge in machine-processable forms. Formalisations of this kind can be subjected to machine ‘reasoning’ that reveals the full set of logical relationships between various instances and classes, and also where logical inconsistencies are to be found. Recent interest in ontologies can be traced to the Semantic Web and its vision of Web users being able to have ‘intelligent agents’ assist them in discovering and interpreting information on the Web (Berners-Lee and Hendler 2001). More recently, this interest has been further fuelled by the Semantic Grid, and its goal of automating the discovery and composition of distributed information resources and processing services (De Roure et al., 2001). The application of ontologies to knowledge representation in scientific research has been a particularly active area,1 as researchers struggle with the challenge of making more effective use of increasingly vast amounts of data and information (Hey and Trefethen, 2003). As a means to codify data in consistent, structured ways they hold enormous promise for the use and re-use of scientific data. Although by no means universally accepted, ontologies are in common use in scientific domains such as biology. A significant part of the value of an ontology derives from it being shared and so we may think of an ontology as a form of cooperative system or infrastructure. It follows, however, that the degree to which an ontology is adopted may depend both on technical adequacy and on the degree to which community interests, purposes and politics can be managed—we gloss the latter as being ‘social’ concerns. As we shall see, the technical and the ‘social’ turn out to be closely intertwined.

The aim of this paper is to examine, mainly through a single case study, how a group of cell biologists and bio-informaticians go about constructing an ontology, the kinds of routine issues that arise for them, and what the implications might be for ontology building and ontology use. In fact, we will argue that what makes this especially interesting is firstly that ontology building is a form of knowledge work that entails relatively little separation between ‘designer’ and ‘user’ and, secondly, that technical problems closely associate with issues concerning who should be involved in the building and what community of users is imagined. A significant part of the value of an ontology derives from its being shared, so we may think of an ontology as a form of cooperative system or infrastructure. As Bowker and Star (1999, p. 109) observe, ‘creating infrastructure is as much social, political and economic work as it is theoretical.’. We might expect, then, that one of the main difficulties in successful ontology building is arriving at a consensus among builders and users and this is borne out by our findings. Finding that consensus, however, is not easy. In the projects we have studied, much time and effort is spent reaching agreement about what should be in a given ontology and what should be left out. This, in turn, raises questions concerning what is the ‘right way’ to go about building an ontology, what tools should be used in its construction and how to support the process (Corcho et al., 2003).

The point we will make is that decisions, policies and strategies for ontology building can be thought of sociologically as issues to do with the assumptions that knowledge workers make when they do their work, the practical nature of their enquiries and the beliefs they carry when they make them. The specific variety of sociology in question is what has been termed ‘ethnomethodologically informed ethnography’ (see Randall et al., 2007). There is no space here to detail the analytic commitments entailed in this view, but it entails a rigorous commitment to detail, to understanding whatever is being done as meaningful and interactional ‘work’ and a refusal to engage in the epistemological and ontological questions that more conventional sociologies are concerned with. As such, it concerns itself with, and only with, the way in which ‘members’ – skilled and competent practitioners – accomplish the tasks they set themselves. It is, to put it succinctly, a practical sociology.

Ethnomethodologically informed ethnography has been regularly deployed in the context of Computer-Supported Cooperative Work (CSCW; see e.g. Randall et al., 2007) and Science and Technology Studies (STS; see Lynch, 1997). In relation to the former, it has been used very effectively to challenge top-down modelling processes in requirements engineering (see Gougen and Jirotka, 1994). The claim was that models of this kind should be complemented by a recognition of complexities that incorporate the ‘social’ (Anderson, 1994, Hughes et al., 1993) and that ethnography can make an effective contribution to requirements gathering and design decision making. STS has also used ethnographic techniques, although in a rather more evaluative way – and to rather more theoretical purposes – in the field of e-Science (see e.g. Baker and D., 2005, Baker and F., 2007).

There have been some recent attempts to incorporate this sociological notion into classification work in general and ontology-related work more specifically (see Pike and Gahegan, 2007). We wish to develop this. Our ethnographic approach provides, we suggest, some purchase on the practical issues relating to ontology building; an analysis that orients to the context-specific and a preliminary analytic framework with which to understand the process better (see e.g. Ure et al., 2007, Lin et al., 2007, Lin et al., 2008).

In recent years, ethnographers have turned to more complex domains and to the knowledge work associated with them (see e.g. Harper et al., 2000). Here, the emphasis has been on a conception of knowledge that emphasises the practise of knowledge production. The dominant metaphor in the sociological approach to classification has become that of ‘expertise sharing’ (Ackermann et al., 2003) and research has emphasised the ordinary, practical ways in which knowledge or expertise is (or is not) shared across knowledge communities.

This involves thinking of ontology or classification building as ‘work’. Such a view owes much to Bowker and Star (1999), who point out that our understanding of classification systems often only reports on the final result – the classification scheme itself – and does not incorporate any appreciation of the work that went into its construction. Our purpose is to identify how this problem of knowledge work and its behavioural dimensions resonates with a number of issues, which in Bowker and Star's terms are ‘challenges’ for classification schema in terms of comparability, visibility and control.

Comparability refers to ‘regularity in semantics and objects’ (Bowker and Star, 1999, p. 231), which refers to the problem of whether and to what degree terms can be defined in consistent ways and hence shared unproblematically. We will endeavour to show that this comparability is not easily arrived at in the context of ontology building and is the source of regular revisiting, even in relatively homogeneous communities of users where one might assume underlying concepts (if not terminology) stand a good chance of being commonly held. Ontologies that should serve more heterogeneous purposes may turn out to be serving one user group more successfully than another—a problem that has been well-attested to in the field of medical informatics (Rector, 1999). As we show below, achieving this regularity entails the use of a wide range of resources, including both artefacts and the development environment.

The second of the challenges to classification recognised by Bowker and Star is visibility. ‘Invisible’ areas of work are ‘by definition unclassifiable except as the residual category: “other”’ (Bowker and Star, 1999, p. 231) ‘Invisible’ work can refer to those informal practices that do not, in themselves, constitute part of the ontology, but that may, nonetheless, be critical to how an ontology is constructed. As we shall see, this has ramifications, both in terms of decisions concerning what shape the ontology should take (the axes of classification), but also defining method and scope of the ontology in question.

The third challenge is that of control, which, in Bowker and Star's terms, is a measure of the degree of prescription that a classification scheme imposes on its users. In the context of ontology building, we argue that this manifests itself largely as a problem of ‘enlistment’, entitlement and purpose—who should be involved in the building of ontologies, how and when. As we shall see, this is not trivial.

If Bowker and Star are right in that these issues are central to any classification system, and, if they pertain as much to ontology building as to any other kind of classification scheme, then it would seem that there are good reasons for examining the work that goes into the construction of ontologies, because that work will ramify in the development of ontologies and of tools to support it. Their success or failure when deployed will not only be a matter of their internal consistency, but also the degree to which they meet organisational requirements. There may be a number of dimensions to this and below we sketch out what some of them may be, based on our own observations of ontology-building work conducted over a period of more than a year.

Section snippets

Case study

The evidence we reproduce below comes from an ongoing study of work by teams of people engaged in the building, maintenance and use of Protégé-OWL, a standard ontology building tool, and groups involved in developing and promoting the use of ontologies for scientific research. Over a period of more than 12 months, we have conducted a series of focus groups, ethnographic observation of face-to-face meetings and interviews with members of Collaborative Open Ontology Development Environment

Ontology building as expertise sharing

The problem of attaining stability in a classification scheme (one that is strongly implicated in any model of ontology maturation) can be understood, as we have indicated, as a practical problem for members—one that depends on the kind of obstacles that typically crop up.

Conclusions

We have argued that ontology building can be understood as a complex of socio-technical issues. As such, treating them merely as ‘engineering’ or ‘modelling’ problems runs the risk of missing some important features of the process. These features, which have to do with interactional and ‘work’ processes in which mutual understandings have to be identified and specific knowledges interwoven, can be glossed as ‘social’ although they closely relate to technical decision making. We have tried to

References (26)

  • Beck, K., Beedle, M., van Bennekum, A., Cockburn, A., Cunningham, W., Fowler, M., Grenning, J., Highsmith, J., Hunt,...
  • Berners-Lee T., Hendler, J., 2001. Scientific Publishing on the Semantic Web....
  • J. Bowker et al.

    Sorting Things Out: Classification and Its Consequences

    (1999)
  • Cited by (18)

    • Comparing ontology authoring workflows with Protégé: In the laboratory, in the tutorial and in the ‘wild’

      2019, Journal of Web Semantics
      Citation Excerpt :

      Often, ontology engineering is a collective and cooperative endeavour, so many studies have focused on the social dynamics of these collaborations. The work by Randall et al. [4] addresses the socio-technical issues of distributed collaborative authoring, highlighting that reaching consensus on the use, purpose and scope of a given ontology may generate tensions among the authors. While the findings emerged from an interview study, others have taken a quantitative approach by mining and clustering the interactions of a community of users [21].

    • Overcoming the pitfalls of ontology authoring: Strategies and implications for tool design

      2014, International Journal of Human Computer Studies
      Citation Excerpt :

      In contrast to collaborative ontology building, we explore axiom addition, which is carried out individually. This study complements the one by Randall et al. (2011) in that our approach directly asks ontology authors open-ended questions in order to ascertain the problems they encounter when performing authoring tasks individually. In this section, we briefly describe the study setup, including the participants, the interview method, and the interview analysis.

    • Growing New Scholarly Communication Infrastructures for Sharing, Reusing, and Synthesizing Knowledge

      2022, Proceedings of the ACM Conference on Computer Supported Cooperative Work, CSCW
    • MOVE: Measuring ontologies in value-seeking environments: CSCW for human adaptation

      2020, Proceedings of the ACM Conference on Computer Supported Cooperative Work, CSCW
    • Semantic web application in e-learning using protege based on information retrieval

      2020, Journal of Advanced Research in Dynamical and Control Systems
    • The coerciveness of the primary key: Infrastructure problems in human services work

      2019, Proceedings of the ACM on Human-Computer Interaction
    View all citing articles on Scopus
    View full text