SPUD—Semantic Processing of Urban Data
Introduction
Urban data comes in many forms, shapes and sizes. Government agencies are increasingly making their data accessible to promote transparency and economic growth. Since the first data.gov initiative launched by the US government, many city agencies and authorities have made their data publicly available through content portals: New York City,1 London,2 San Francisco,3 Boston,4 and Dublin,5 to name a few. In the meanwhile, Linked Data has emerged as a way to integrate information across sources and domains. Managing Open and Linked data require that publishers put significant resources. A critical question for government agencies is what return-on-investment they are getting for resources spent in making their data open. This may come as an increase in economic activity in their constituencies, decrease in administration costs and increased transparency. User generated content can provide information outside of the scope of traditional data sources. For example, a traffic jam that emerges due to an unplanned protest may be captured through a twitter stream, but missed when examining weather conditions, event databases, reported roadworks, etc. Additionally, weather sensors in the city tend to miss localized events such as flooding. These views of the city combined however, can provide a richer and more complete view of the state of the city, by merging traditional data sources with messy and unreliable social media streams.
The urban data emerging from such sources may be used to support various operations such as exploration, visualization, querying and diagnosis. Nevertheless, the cost associated with integrating all of this information is prohibitive. Our claim is that semantic technologies can be used to drastically lower the entry cost to accessing the information of a city. We demonstrate a technology platform to address key business challenges for urban information management: (a) Publication of a dataset, focusing on privacy protection and semantic annotation, (b) Reporting and Consolidation of multi-faceted information, focusing on searching and visualizing heterogeneous data from several sources, including social media, linked data and government data and aggregating this information into a single view and (c) In depth analysis of this information, to derive conclusions with significant business value. Example of such conclusions include the detection and diagnosis of events or anomalies.
The novelty of SPUD lies in the ability of the system to ingest highly heterogeneous data and process it in an incremental manner. Unlike other approaches, the cost of entry is minimal (i.e. datasets can be imported as they are), and processing (annotation, linking, integration) can be done incrementally, while fully exploiting the power of semantic technologies. In addition, we are showing how a stack based on semantic technologies can go a long way, without the need for global integration, or even linking the entire input. We demonstrate SPUD using hundreds of real-world datasets, published by 4 local authorities, on an open data platform6 datasets from the Semantic Web and data retrieved from Social Media and other Web sources. SPUD incorporates several research efforts for which we provide extensive descriptions in [1], [2], [3], [4], [5].
The rest of this paper is structured as follows: Section 2 presents a set of motivating use-cases centered around Dublin. Section 3 outlines our approach. The main research methods and technologies applied in SPUD are outlined in Section 4 and a deployment is presented in Section 5. We present the related work in Section 6 and conclude in Section 7.
Section snippets
Use-cases
We are presenting SPUD through a series of business cases pertaining to ambulance response times in Dublin. The target audience has various roles within public administration (or contracted entity working with public authorities) and varied competency with regard to semantic technologies. For each use-case, we are outlining how SPUD addresses the related challenges.
Approach
In this Section, we are describing the general approach we are taking in SPUD. Fig. 1 summarizes the steps taken to go from raw data to a useful business result, from a data management perspective.
Technologies
In this section, we outline the core technologies used in SPUD, providing pointers to more thorough descriptions, where available.
Deployment
We present a high-level architecture of the components and technologies presented in the previous sections in Fig. 3. The main elements of our architecture are a set of APIs (mostly REST), where an HTML5-based front-end interfaces, a IBM WebSphere Application Server, where the main application logic and the Enterprise Apps such as the Diagnoser and the Trajectory Miner are running, a publishing container to facilitate transfer of large files and an enterprise SAN for file storage. We, have used
Related work
Often, urban data is sourced from legacy non-relational systems or spreadsheets made for consumption by humans. The data is potentially very large, highly heterogeneous, spanning different domains, and with unknown structure (from static data to spatial–temporal data obtained from physical sensors). Moreover, the users who want to consume these data are not data integration experts and are not necessarily able to query data using structured query languages. We look at existent semantic
Conclusions and future work
In this paper, we have presented an end-to-end semantic approach to extract interesting business results for a combination of open datasets, proprietary datasets and social media, all pertaining to urban information. We have illustrated that Semantic Technologies are indeed applicable to complex business problems, and can cope in scenarios such as the one presented in Section 2 with acceptable performance overheads, at least for cities in the size of Dublin and a focused domain. We have
References (20)
- et al.
Twc logd: A portal for linked open government data ecosystems
Web Semant.: Sci. Serv. Agents on the World Wide Web
(2011) - et al.
Characterizing diagnoses and systems
Artif. Intell.
(1992) - et al.
Guided exploration and integration of urban data
- et al.
Westland row why so slow?: fusing social media and linked data sources for understanding real-time traffic conditions
- et al.
EXSED: an intelligent tool for exploration of social events dynamics from augmented trajectories
- et al.
Applying semantic web technologies for diagnosing road traffic congestions
- et al.
Queriocity: a linked data platform for urban information management
Int. Semant. Web Conf.
(2012) - et al.
Failure diagnosis of discrete event systems with linear-time temporal logic fault specifications
IEEE Trans. Automat. Control
(2002) - et al.
Privacy-preserving data publishing: a survey of recent developments
ACM Comput. Surv.
(2010) - M. Paolucci, T. Kawamura, T.R. Payne, K. Sycara, Semantic matching of web services capabilities, in: Proceedings of the...
Cited by (13)
Analysis and assessment of a knowledge based smart city architecture providing service APIs
2017, Future Generation Computer SystemsCitation Excerpt :The above Case (c) solutions have to cope with Graph Database collecting huge amount of data, thus resulting in Big Data cases and scenarios presenting relevant data such as variety, velocity, veracity, volume, etc. [37,47]. An effective integration at semantic level of the data domain enables the creation of Smart Decision Support Systems that exploit the possibility of making semantic queries on multiple domains, to make probabilistic reasoning on Bayesian decision support [53,47], and to enable the production of algorithms for implementing personalized routing and Personal Assistants in the city. In some cases, the adoption of graph database to store and retrieve smart city data may be not the most effective solution in terms of performance despite it may enable reasoning as inference [54].
Design of a spatial database to analyze the forms and responsiveness of an urban environment using an ontological approach
2016, CitiesCitation Excerpt :Regarding the context of an urban system, the extent of available perceptions ranges from inhabitants to stakeholders and urban designers with various expertise and interests. The objective of an ontology approach is to provide a common reference, then avoiding conceptual conflicts at the modeling level (Jung, Sun, & Yuan, 2013; Kotoulas et al., 2014). Next, these ontologies should be mapped appropriately towards a spatial database.
Query Interface for Smart City Internet of Things Data Marketplaces: A Case Study
2023, ACM Transactions on Internet of ThingsReference models for intelligent cities: An aligned template
2020, Developing and Monitoring Smart Environments for Intelligent CitiesHDSAnalytics: A data analytics framework for heterogeneous data sources
2018, ACM International Conference Proceeding Series