Keywords

1 Introduction

The main objective of our project was to transform large amounts of spreadsheet data in the field of budgeting available on the web to RDF linked data in order to make it accessible for cross analysis and federated queries [3]. We realized that in order to succeed one needs to have a conceptual model to apply to the transformation. One of the most critical tasks was to produce an ontology to represent the knowledge accumulated in open budget domain. The extra complications to overcome was that our team is distributed between Russia and Great Britain and not all experts are familiar with Semantic Web standards. We assume that many ontology development teams are facing the same challenges and the solution that we introduce can be beneficial and possesses a sufficient scalability.

As for ontology engineering methodology, the team has chosen to work according to METHODOLOGY: it supports re-usability and suggests a clear life cycle with evolving prototype model, thus ensuring effective process for distributed teams.

1.1 Workflow Scenario

At the initiation stage our team adopted the following workflow scenario:

  1. 1.

    Collecting the appropriate data sets of open budget data.

  2. 2.

    Analysis of collected data and production of generalized concept maps.

  3. 3.

    Construction of domain model fragments in sketches and drawings.

  4. 4.

    Development of the ontology in authoring tool.

  5. 5.

    Community review and approval.

On Stages 1, 2 and 3 selection of tools is quite trivial and stays beyond the scope of the demo. For Stages 4 and 5 we prescribed the tool category with regard to the deliverables that we committed to ship (see Table 1). On the next step for each tool category we specified the requirements based on deliverables and team limitations (see Table 2).

Table 1. Stage goals and tool categories
Table 2. Tool categories and requirements
Fig. 1.
figure 1

Workflow and the choice of corresponding tools

1.2 Requirements Justification

Free-to-use. Our project is non-profit and community driven one and therefore we cannot afford any pay-for products at the present stage. Publicity and visibility. We are strongly interested in contributions, for which the openness is crucial.

Support for collaborative work. We as a team exist in different time zones and work remotely from each other and hence have a strong need for collaborative work solutions.

Diagrams as the main artifact. Since our team members vary greatly in their expertise and specialization the only common language they can use is diagramming of the domain knowledge.

2 Technical Solution

2.1 Tool Selection

Ontology authoring tool. Initially the most preferable candidate was WebProtégé due to its support for collaborative work and online accessibility of its projects and files. The testing revealed that WebProtégé constituted too much effort to migrate to from offline solution since it has some peculiarities related to sign in and assignment of URI’s to objects and we rolled back to limited usage of the offline Protégé. Due to Protégé being not precisely suited for generation of diagrams from ontologies and publishing on the web we still required a separate tool to visualize and publish ontologies for domain experts and community. The presence of Protégé in the stack also forced us to couple it with online repository hosting service for us to be able to exchange data and version control the content.

Ontology visualization tool. The tool selection is quite diverse for this task and we made use of the thorough and most detailed survey of available visualization tools [1] and found no particular tool to fulfill our vision. We made a commitment to use what we already had at our disposal: one of the team members was the contributor to ontology visualization tool development project for ITMO University. The tool was already demonstrated at ISWC2015 under the name “Ontodia” and collected good feedback [2]. Since the tool met most of the requirements except for integration with GitHub and we entirely maintain its code, we decided to extend its functionality with integration means. The description of the integration solution is provided in the next subsection.

Online repository hosting service. Provided our requirements regarding visibility and publicity we employed GitHub to serve as our file publishing medium. The need for it arose from the decision to have Protégé as our only option for modelling environment. The presence of GitHub also resulted in additional requirement for Ontodia to be integrated with GitHub.

2.2 Integration of Ontodia with GitHub

The introduced integration solution is rather obvious. It was tested not only with GitHub but with WebProtégé as well. It was discovered that Github generates the permanent URL of the file page with the following structure: https://github.com/repoowner/reponame/blob/branchname/folder/subfolder/filename. In order to get access to the file itself one needs to have the link of the following structure: https://raw.githubusercontent.com/repoowner/reponame/repobranch/folder/subfolder/filename. The desired transformation was made with a simple regular expression operation.

Fig. 2.
figure 2

Visualization of a part of the open budget ontology in Ontodia from Github OWL file

Ontodia has a feature of a file upload already implemented, therefore once the link leading to the file is provided Ontodia can use it as a new data source for building diagrams upon it. From the user perspective we created a new type of data source, which we tagged “GitHub source file” and that can be configured by providing a link to one or several files.

The other important feature for integration with any type of online data source is synchronization. We implemented two means of syncing with GitHub: (i) forced sync - it is when the user knows that some changes were made to the source file and wants to have the updated version in Ontodia, so he initiates the file update and (ii) regular sync - Ontodia once in 30 min syncs with the latest version of the file.

As a result the user can connect Ontodia to GitHub ontology, visualize all of it or its certain part, share it with his colleagues via their email addresses, publish it with a permanent link on the web. Ontodia can be used for preparing presentation slides by utilizing its bitmap and vector file export feature. The user invited to view the published ontology may explore the data with the use of filter button located underneath each node - See Fig. 2.

Figure 1 illustrates the proposed and well-tested workflow solution for collaborative ontology development with support for visibility and iterative approach.

The proposed toolset covers the full cycle of ontology production: (a) making changes to the ontology in Protégé; (b) pushing new file version on GitHub; (c) obtaining and publishing automatically updated diagrams in Ontodia.