Elsevier

Journal of Web Semantics

Volume 24, January 2014, Pages 11-17
Journal of Web Semantics

SPUD—Semantic Processing of Urban Data

https://doi.org/10.1016/j.websem.2013.12.003Get rights and content

Abstract

We present SPUD, a semantic environment for cataloging, exploring, integrating, understanding, processing and transforming urban information. A series of challenges are identified: namely, the heterogeneity of the domain and the impracticality of a common model, the volume of information and the number of data sets, the requirement for a low entry threshold to the system, the diversity of the input data, in terms of format, syntax and update frequency (streams vs static data), the complex data dependencies and the sensitivity of the information. We propose an approach for the incremental and continuous integration of static and streaming data, based on Semantic Web technologies and apply our technology to a traffic diagnosis scenario. We demonstrate our approach through a system operating on real data in Dublin and we show that semantic technologies can be used to obtain business results in an environment with hundreds of heterogeneous datasets coming from distributed data sources and spanning multiple domains.

Introduction

Urban data comes in many forms, shapes and sizes. Government agencies are increasingly making their data accessible to promote transparency and economic growth. Since the first data.gov initiative launched by the US government, many city agencies and authorities have made their data publicly available through content portals: New York City,1 London,2 San Francisco,3 Boston,4 and Dublin,5 to name a few. In the meanwhile, Linked Data has emerged as a way to integrate information across sources and domains. Managing Open and Linked data require that publishers put significant resources. A critical question for government agencies is what return-on-investment they are getting for resources spent in making their data open. This may come as an increase in economic activity in their constituencies, decrease in administration costs and increased transparency. User generated content can provide information outside of the scope of traditional data sources. For example, a traffic jam that emerges due to an unplanned protest may be captured through a twitter stream, but missed when examining weather conditions, event databases, reported roadworks, etc. Additionally, weather sensors in the city tend to miss localized events such as flooding. These views of the city combined however, can provide a richer and more complete view of the state of the city, by merging traditional data sources with messy and unreliable social media streams.

The urban data emerging from such sources may be used to support various operations such as exploration, visualization, querying and diagnosis. Nevertheless, the cost associated with integrating all of this information is prohibitive. Our claim is that semantic technologies can be used to drastically lower the entry cost to accessing the information of a city. We demonstrate a technology platform to address key business challenges for urban information management: (a) Publication of a dataset, focusing on privacy protection and semantic annotation, (b) Reporting and Consolidation of multi-faceted information, focusing on searching and visualizing heterogeneous data from several sources, including social media, linked data and government data and aggregating this information into a single view and (c) In depth analysis of this information, to derive conclusions with significant business value. Example of such conclusions include the detection and diagnosis of events or anomalies.

The novelty of SPUD lies in the ability of the system to ingest highly heterogeneous data and process it in an incremental manner. Unlike other approaches, the cost of entry is minimal (i.e. datasets can be imported as they are), and processing (annotation, linking, integration) can be done incrementally, while fully exploiting the power of semantic technologies. In addition, we are showing how a stack based on semantic technologies can go a long way, without the need for global integration, or even linking the entire input. We demonstrate SPUD using hundreds of real-world datasets, published by 4 local authorities, on an open data platform6 datasets from the Semantic Web and data retrieved from Social Media and other Web sources. SPUD incorporates several research efforts for which we provide extensive descriptions in  [1], [2], [3], [4], [5].

The rest of this paper is structured as follows: Section  2 presents a set of motivating use-cases centered around Dublin. Section  3 outlines our approach. The main research methods and technologies applied in SPUD are outlined in Section  4 and a deployment is presented in Section  5. We present the related work in Section  6 and conclude in Section  7.

Section snippets

Use-cases

We are presenting SPUD through a series of business cases pertaining to ambulance response times in Dublin. The target audience has various roles within public administration (or contracted entity working with public authorities) and varied competency with regard to semantic technologies. For each use-case, we are outlining how SPUD addresses the related challenges.

Approach

In this Section, we are describing the general approach we are taking in SPUD. Fig. 1 summarizes the steps taken to go from raw data to a useful business result, from a data management perspective.

Technologies

In this section, we outline the core technologies used in SPUD, providing pointers to more thorough descriptions, where available.

Deployment

We present a high-level architecture of the components and technologies presented in the previous sections in Fig. 3. The main elements of our architecture are a set of APIs (mostly REST), where an HTML5-based front-end interfaces, a IBM WebSphere Application Server, where the main application logic and the Enterprise Apps such as the Diagnoser and the Trajectory Miner are running, a publishing container to facilitate transfer of large files and an enterprise SAN for file storage. We, have used

Related work

Often, urban data is sourced from legacy non-relational systems or spreadsheets made for consumption by humans. The data is potentially very large, highly heterogeneous, spanning different domains, and with unknown structure (from static data to spatial–temporal data obtained from physical sensors). Moreover, the users who want to consume these data are not data integration experts and are not necessarily able to query data using structured query languages. We look at existent semantic

Conclusions and future work

In this paper, we have presented an end-to-end semantic approach to extract interesting business results for a combination of open datasets, proprietary datasets and social media, all pertaining to urban information. We have illustrated that Semantic Technologies are indeed applicable to complex business problems, and can cope in scenarios such as the one presented in Section  2 with acceptable performance overheads, at least for cities in the size of Dublin and a focused domain. We have

References (20)

  • L. Ding et al.

    Twc logd: A portal for linked open government data ecosystems

    Web Semant.: Sci. Serv. Agents on the World Wide Web

    (2011)
  • J. De Kleer et al.

    Characterizing diagnoses and systems

    Artif. Intell.

    (1992)
  • V. Lopez et al.

    Guided exploration and integration of urban data

  • E.M. Daly et al.

    Westland row why so slow?: fusing social media and linked data sources for understanding real-time traffic conditions

  • G. Di Lorenzo et al.

    EXSED: an intelligent tool for exploration of social events dynamics from augmented trajectories

  • F. Lécué et al.

    Applying semantic web technologies for diagnosing road traffic congestions

  • V. Lopez et al.

    Queriocity: a linked data platform for urban information management

    Int. Semant. Web Conf.

    (2012)
  • W. Qiu et al.

    Failure diagnosis of discrete event systems with linear-time temporal logic fault specifications

    IEEE Trans. Automat. Control

    (2002)
  • B.C.M. Fung et al.

    Privacy-preserving data publishing: a survey of recent developments

    ACM Comput. Surv.

    (2010)
  • M. Paolucci, T. Kawamura, T.R. Payne, K. Sycara, Semantic matching of web services capabilities, in: Proceedings of the...
There are more references available in the full text version of this article.

Cited by (13)

  • Analysis and assessment of a knowledge based smart city architecture providing service APIs

    2017, Future Generation Computer Systems
    Citation Excerpt :

    The above Case (c) solutions have to cope with Graph Database collecting huge amount of data, thus resulting in Big Data cases and scenarios presenting relevant data such as variety, velocity, veracity, volume, etc. [37,47]. An effective integration at semantic level of the data domain enables the creation of Smart Decision Support Systems that exploit the possibility of making semantic queries on multiple domains, to make probabilistic reasoning on Bayesian decision support [53,47], and to enable the production of algorithms for implementing personalized routing and Personal Assistants in the city. In some cases, the adoption of graph database to store and retrieve smart city data may be not the most effective solution in terms of performance despite it may enable reasoning as inference [54].

  • Design of a spatial database to analyze the forms and responsiveness of an urban environment using an ontological approach

    2016, Cities
    Citation Excerpt :

    Regarding the context of an urban system, the extent of available perceptions ranges from inhabitants to stakeholders and urban designers with various expertise and interests. The objective of an ontology approach is to provide a common reference, then avoiding conceptual conflicts at the modeling level (Jung, Sun, & Yuan, 2013; Kotoulas et al., 2014). Next, these ontologies should be mapped appropriately towards a spatial database.

  • Reference models for intelligent cities: An aligned template

    2020, Developing and Monitoring Smart Environments for Intelligent Cities
  • HDSAnalytics: A data analytics framework for heterogeneous data sources

    2018, ACM International Conference Proceeding Series
View all citing articles on Scopus
View full text