Visual subsetting, conversion and complex query exploitation in large spatio-temporal databases

https://doi.org/10.1016/j.compeleceng.2017.06.015Get rights and content

Abstract

In recent years, large amounts of multidimensional data are being generated at a rapid pace, creating the need to: develop appropriate tools for visual data analytics and for enhancing accessibility and discoverability of the data; and new algorithms and methods for seamless integration, manipulation and interactive visualization of data. We describe an interactive tool and technique for data mining and visualization, called SubVizCon that integrates advanced concepts for spatio-temporal databases with interactive visualization capabilities of the geographic information system software. More specifically, we illustrate the development and implementation of the SubVizCon framework, which integrates geospatial data analysis, feature extraction, and visualization functions in ArcGIS with the advanced subsetting and querying functions of the Parametric Database Model. The seamless integration and interoperability of these software tools within SubVizCon framework enables users to not only visualize large amounts of data but also discover spatio-temporal patterns that might otherwise remain hidden.

Introduction

The ever-increasing complexity of applications that involve manipulation of large volumes of multidimensional data presents new and exciting challenges regarding the ability to efficiently represent, query, and visualize such datasets, while preserving the declarative features of SQL-like query language and interactive visualization. It has long been recognized that existing geographic information system (GIS) analysis and visualization tools are incapable of handling temporal data, while relational database models fail to handle multidimensional data that not only have highly complex associated operations but also spatial granularity. Furthermore, common multidimensional database technologies usually do not adequately support the spatial and temporal data structures of many real-world applications, thereby making spatio-temporal database indexing and query complex and problematic. New tools that not only allow spatial feature extraction and visualization through GIS geoprocessing, but also enables dynamic and interactive query of spatial, temporal and spatio-temporal data are needed.

Visualization techniques have also been recognized as powerful and relevant in applications involving spatial, temporal and spatio-temporal data, as they take advantage of human abilities to perceive visual patterns and interpret them. However, it is widely noted that the spatial data visualization features provided in many GIS software platforms are inadequate to handle the temporal dimensions of data. Therefore, alternative solution for mining and visualization of spatio-temporal data are needed. Indeed, new tools and solutions should not only include interactive visualization of results generated during the database query process, but also the possibility for dynamically and interactively obtain different spatial and temporal views of the data. For example, the functionality to dynamically changing from visual mapping display in GIS and complex spatio-temporal query in database modeling systems should be provided. This can allow the user to discover spatio-temporal patterns that might otherwise remain hidden. Thus, the problems of how to perform complex queries on large heterogeneous databases and to define effective interfaces for mapping and viewing such data are challenges that constitute the primary focus of this work. Motivated by the need to enhance the capability of GIS to handle spatial, temporal and spatio-temporal data and to create a seamless and interoperable interfaces with spatio-temporal database models, two primary goals guided the work described in this paper. The first was to provide efficient and interactive approach to extracting and converting feature layer in ArcGIS's ArcMAP into GML format containing translations of geometries and associated data that can be discovered using standard database model and query operations. The second goal was to provide robust algorithms for interactive spatio-temporal query processing using advanced parametric database modeling [1]. The SubVizCon framework helps bring the two communities – GIS and Parametric Database Model (ParaDB) – together, where the output of one can be utilized as input for other with little or no efforts.

The remainder of the paper is structured as follows. First, we provide a brief review of some existing spatio-temporal data models that related work on the query and visualization of large spatio-temporal datasets in Section 2. This is followed by a brief description of ParaDB and the ArcGIS software platform in Section 3, which constitute the primary components of the SubVizCon framework. Next, in Section 4, we present the essential features and implementation of SubVizCon and results using an example use case. Finally, the paper draws some conclusions and delineates areas for future work in Section 5.

Section snippets

Related work

The purpose of this work is to develop and demonstrate the capability of a software tool that better facilitates query discovery and geospatial visualization of large, heterogeneous datasets with both spatial and temporal granularity. The design goals for the SubVizCon tool are three-fold: to allow extraction of geospatial features and subsets from ArcGIS ArcMap, 2) to efficiently manage the diverse sets of spatio-temporal data extracted from ArcMap so that they can be effectively queried and

Structure of the SubVizCon framework

In this section we briefly describe the individual components of the SubVizCon framework that includes the Parametric Database Model, ParaDB, and ArcGIS. Fig. 1 shows the linkages among these components.

Implementation of SubVizCon and results

In this section we illustrate the potential of SubVizCon to the integrated data discovery and geospatial visualization with an application of methods to a subset of spatio-temporal climate, soil and crop data sets collected and organized by states within the northcentral USA. The datasets were collected to enhance the efficient management of crops and to reduce risk factors associated with implementation of agricultural practices. The NC-94 data sets provide the most comprehensive records of

Conclusions and future work

The generation of large, heterogeneous, and complex multidimensional datasets at a rapid pace from various disparate sources and devices, especially through social media, has created challenges for data discovery and data mining. The increasing complexity of applications that store, manipulate analyze and visualize large heterogeneous datasets are also facing new challenges regarding the ability to efficiently represent, query and visualize such datasets. The objective of this paper has been to

Acknowledgment

Authors are thankful to Williams Gutowski, Johnny Wong and Simanta Mitra at Iowa State University for their valuable suggestions and help to make this work possible.

Sugam Sharma obtained his PhD in Computer Science from Iowa State University, USA in 2013. His research interests include spatio-temporal databases, big data, and GIS. He also holds MS in Computer Science from Jackson State University, USA and BE in Computer Science & Engineering from Roorkee, India.

References (16)

  • M. Takatsuka et al.

    GeoVISTA Studio: a codeless visual programming environment for geoscientific data analysis and visualization

    Comput Geosci

    (2002)
  • S.K. Gadia et al.

    Temporal databases: A prelude to parametric data

  • J.L. Mennis et al.

    A conceptual framework for incorporating cognitive principles into geographical database representation

    Int J Geograph Inf Sci

    (2000)
  • A.M. MacEachren et al.

    Constructing knowledge from multivariate spatiotemporal data: Integrating geographical visualization and knowledge discovery in database methods

    Int J Geograph Inf Sci

    (1999)
  • S. Sharma et al.

    Geo-spatial patterns determination for SNAP eligibility in iowa

  • S. Sharma et al.

    On analyzing the degree of coldness in iowa, a north central region, United States: an XML exploitation in spatial database (NC94)

  • C. Lynch et al.

    Interoperability, scaling, and the digital libraries research agenda, 1995

  • M.F. Goodchild et al.

    Improved spatial data interoperability: a framework for geostatistical support-to-support interpolation

    (2013)
There are more references available in the full text version of this article.

Sugam Sharma obtained his PhD in Computer Science from Iowa State University, USA in 2013. His research interests include spatio-temporal databases, big data, and GIS. He also holds MS in Computer Science from Jackson State University, USA and BE in Computer Science & Engineering from Roorkee, India.

Udoyara Sunday Tim obtained his PhD in Civil & Environmental Engineering from Concordia University, Canada in 1987. Presently, he is an associate professor in Agricultural and Biosystems Engineering at Iowa State University. His research interests include the development and application of computer simulation models, decision support systems, data mining, virtual reality technology, and GIS technologies.

Shashi Gadia obtained PhD in Mathematics from University of Illinois, Urbana in 1977. Presently he is an associate chair in Computer Science at Iowa State University. His main research interest is in databases with dimensions such as time, space, and beliefs, etc. He has worked in temporal, spatial, and multilevel security databases, optimization, and query languages. He pioneered temporal databases.

Reviews processed and recommended for publication to the Editor-in-Chief by Guest Editor Dr. R. C. Poonia.

View full text