Visual Exploration of Geolocated Time Series with Hybrid Indexing

doi:10.1016/j.bdr.2019.02.001

Big Data Research

Volume 15, March 2019, Pages 12-28

https://doi.org/10.1016/j.bdr.2019.02.001 Get rights and content

Abstract

Geolocated time series are time series that correspond to specific locations. They can represent, for example, visitor check-ins at certain venues or readings of sensors installed at various places. The amount and significance of such time series have increased in many domains over the last years. However, although several works exist for time series visualization and visual analytics in general, there is a lack of efficient techniques for visual exploration and analysis of geolocated time series in particular. In this paper, we present two approaches that rely on hybrid spatial-time series indices to allow for efficient map-based visual exploration and summarization of geolocated time series data. In particular, we use the BTSR-tree index and we introduce a new variant of the iSAX index, called geo-iSAX. The former is a spatial-first hybrid index that extends the R-tree by maintaining bounds for the time series indexed at each node. Following a similar rationale, geo-iSAX is a time series-first hybrid index that maintains spatial MBRs of the geolocated time series indexed in each node. We describe the structure of these indices and show how they can be directly exploited to produce map-based visualizations of geolocated time series at different levels of granularity. We empirically validate our approach using two real-world datasets, as well as a synthetic one that is used to test the scalability of our methods.

Introduction

Time series are generated and stored at a vastly increasing rate in many industrial and research applications, including the Web and the Internet of Things, public utilities, finance, astronomy, biology, and many more. A significant portion concerns geolocated time series, i.e., those generated at, or otherwise associated with, specific locations. While indexing, mining and exploring time series data has attracted a lot of interest from the database and data mining communities [1], [2], [3], studying of geolocated time series is still largely overlooked.

Geolocated time series can be found in various domains and applications. For instance, time series can be used to represent, analyze and forecast water consumption measured by smart meters installed in urban households [4]. Analyzing such time series can provide valuable insights regarding trends and patterns of consumer behavior in daily life. These results can then be used to forecast and balance water demand, as well as to plan and prioritize interventions that can guide consumers towards more sensible water use. Similar use cases can be found in other domains, such as in geomarketing or mobile advertisement, where geolocated time series may represent the number of visitors or the revenue generated at a certain location across time. Extracting insights, trends and patterns can be significantly facilitated by map-based visualizations of summarized time series data. Such visualizations can reveal, for instance, which type of consumption patterns are most frequently observed among consumers in a certain area or what the spatial distribution of sales for a certain product looks like.

However, time series is an inherently complex data type, and such datasets can reach extremely large volumes, both horizontally (i.e., very long series of values across time) and vertically (i.e., time series generated by countless sources). Consequently, management, analysis and exploration of big time series data is a task requiring efficient and scalable algorithms. In particular, visual exploration of geolocated time series needs to process the required information efficiently, while the user interacts with the application. For example, whenever the user zooms in or scrolls the map, visual analytics and aggregates should be computed on-the-fly, e.g., identifying the predominant patterns in the time series and their spatial distribution within the actual map area.

Consider the example illustrated in Fig. 1(a). When the user zooms the map into the red rectangle, the visualization application should identify, summarize and present the two patterns (shown in blue and green color) appearing therein. For such requests that inherently combine spatial filters with time series analysis, it is inefficient to evaluate each predicate separately, e.g., apply a spatial filter on the time series of a large dataset and then calculate summaries of the candidates, or vice versa. The same stands for the case of exploration on the time series domain, as depicted in Fig. 1(b). Consider a user drawing a timebox (i.e., a rectangle in the time series domain) or zooming in the yellow part. The application should identify the time series that are fully contained within that filter area, i.e., their values along the specified time range fall within the value range (both ranges shown in orange in Fig. 1(b)), and then provide an informative summary comprising aggregate spatial information to avoid cluttering the map.

Efficient filtering and retrieval over large datasets of geolocated time series can be enabled by indexing. Several approaches have been proposed that efficiently index large amounts of plain time series data. They either rely on Discrete Wavelet Transform to reduce the dimensionality of time series [5], or make use of a family of indices based on Symbolic Aggregate Approximation (SAX) [3], [6], [1], [7]. However, all aforementioned techniques index the data solely on the time series domain, not taking the spatial dimension into account. If each analyzed time series is inherently associated with a spatial attribute (e.g., locations of smart meters), such indexing is not sufficient for queries and visualizations that additionally involve spatial filters.

In this paper, we propose two geolocated time series summarization approaches for visual exploration, named bundle and tile map summary. These are supported and driven by two appropriate hybrid indices that speed up the result computation, providing efficient exploration of geolocated time series data. They consist of a spatial and a time series summary that jointly facilitate knowledge extraction and insight gaining. The spatial summary is similar for both and consists of Minimum Bounding Rectangles (MBRs) of geolocated time series, according to a specific predicate (i.e., spatial proximity, or time series similarity). Each MBR is associated with a counter denoting the number of time series it contains. A visualization example of the spatial summary is depicted in Fig. 1(a), where the geolocated time series are organized in two groups (i.e., green and blue colored) according to their similarity. Each group is depicted along with a number that indicates the amount of time series that it contains (i.e., three geolocated time series for the first group and four for the second).

The main difference among the two methods lies in the time series part of the summary. The bundle summary consists of sets of Minimum Bounding Time Series (MBTS) [8], an example of which is depicted in Fig. 2(a). An MBTS is a band with upper and lower bounds that encloses all time series of a set, providing with a notion of a range of the time series values throughout the time axis. On the other hand, the tile map summary (Fig. 2(b)) of a set of time series indicates (using a corresponding shading), the density of the time series points at each tile of a partitioning of the domain, obtained by discretizing the time and value axes. This way, it avoids overplotting that would be caused by outputting a large number of resulting time series and provides a notion of how the values of the time series are distributed across time.

The bundle summary is driven by our recently proposed BTSR-tree index [8]. This is a spatial-first hybrid index, in the sense that it is primarily built on the spatial attribute of the data. More specifically, it is an extension of the R-tree spatial index [9], offering efficient support for similarity search over geolocated time series. The idea behind the BTSR-tree index is to combine both spatial proximity and time series similarity. To that end, in addition to the standard MBR denoting the spatial bound of its contents, each node is augmented with an MBTS of all the time series contained in its subtree. Maintaining both kinds of bounds per node enables pruning the search space simultaneously in the spatial and the time series domains while traversing the index. To increase pruning effectiveness, the time series indexed in a given node are further distinguished into bundles on the basis of their similarity, hence achieving tighter bounds in the MBTS of these bundles. For providing prompt visualizations of summaries over geolocated time series data and minimizing latency when drawing the relevant graphic elements, we need early access to both spatial and time series information while traversing the index. For this purpose, we adapt the BTSR-tree index so as to also include aggregates per node, i.e., the number of time series pertaining to each bundle. Subsequently, we introduce a new traversal algorithm for efficient retrieval of a given number of bundles that are the most representative in the map area.

The tile map summary is driven by geo-iSAX, a hybrid index we introduce in this paper. This is a time series-first index, i.e., it is primarily built in the time series domain. More specifically, it constitutes a hybrid variant of the iSAX index [3], [6], [1], augmented with spatial attributes of its nodes' children, to combine spatial and time series information. In each node, besides the SAX word that describes all its children time series, geo-iSAX keeps also the MBR that they form. To minimize the size and overlap of the MBRs, we propose a spatial splitting policy, that instead of choosing the splitting dimension in a round-robin fashion (as in iSAX ), it does so by selecting the dimension that produces the smallest overlap and overall size of the two generated MBRs. We introduce a traversal algorithm for applying timebox search on large (both vertically and horizontally) geolocated time series datasets. The traversal algorithm is applied on our geo-iSAX index and returns a tile map-like summary of the qualifying geolocated time series, by taking advantage of the SAX representation's properties.

To the best of our knowledge, this is the first work that considers visual exploration and summarization of geolocated time series. Specifically, we propose two summarization methods enabling efficient map-based exploration driven by suitable hybrid indices. The work in this paper builds upon and extends our effort in [10], which showed the benefits of using BTSR-tree for visual exploration of geolocated time series. Here, we introduce another novel summarization approach that enables a different exploration method, by employing a time series-first hybrid index. In brief, our main contributions are as follows:

•
We suggest an adapted variant of the BTSR-tree index, as well as a novel algorithm for its traversal in order to quickly retrieve summaries (a.k.a. bundles) of geolocated time series within a given spatial area.
•
We propose a hybrid variant of the iSAX index, called geo-iSAX, which combines time series with spatial information within its nodes. Based on that, we describe a novel traversal algorithm for geo-iSAX that enables fast timebox search by performing efficient pruning, while avoiding false negatives.
•
We exemplify the proposed visualization methods with two use cases based on real-world datasets. In addition, we empirically evaluate the performance of our summarization methods, confirming their low execution time against a large synthetic dataset of geolocated time series.

The remainder of this paper is organized as follows. Section 2 reviews related work. Section 3 outlines basic concepts and formulates the problem. Sections 4 and 5 introduce our methods for efficient visual exploration of geolocated time series by harnessing the potential of the BTSR-tree and geo-iSAX indices, respectively. Section 6 presents indicative use cases with map visualizations and also reports performance results from our empirical study. Finally, Section 7 concludes the paper and outlines future research directions.

Section snippets

Related work

In the following, we review existing approaches regarding indexing and visual exploration of time series.

Indexing of time series Earlier approaches towards indexing of time series data were often based on leveraging multi-resolution representations. For instance, the Discrete Wavelet Transform [11] is used in [5] to gradually reduce the dimensionality of time series data via the Haar wavelet and generate an index using the coefficients of the transformed sequences. In [12], it is further

Problem formulation

A time series is a time-ordered sequence of values $T = {v_{1}, \dots, v_{n}}$ , where $v_{i}$ is the value at the i-th time point and n is the length of the series. In this work, we specifically deal with geolocated time series [8], i.e., time series that are additionally characterized by a location, denoted by $T . l o c$ . Assuming a 2-dimensional space, we further use the notation $T . l o c_{x}$ , $T . l o c_{y}$ to refer to the $(x, y)$ coordinates of T's location. In the rest of the paper, when it is clear from the context, we also refer

Computing bundle summaries

Intuitively, the first visualization method displays the bundle summaries for a spatial area of interest, as defined in Section 3.1. This may concern the currently visible area on a map, so a set of time series patterns and their respective spatial extents are computed and visualized. Using this process, a user can select the bundle of her preference and the proper spatial summary will appear on the map after acquiring the necessary MBRs from the BTSR-tree index. Whenever the user zooms in/out

Computing tile map summaries

In the following, we present our second visualization method, which allows the user to draw one or more timeboxes on the time series domain. This triggers a traversal of our hybrid geo-iSAX index to obtain the geolocated time series in the currently visible map area and also fully contained within these timeboxes. The result comes in the form of tiles, each spanning between two iSAX breakpoints, along with a count per tile indicating the number of time series whose SAX symbol resides within

Experimental evaluation

In this section, we evaluate the proposed visualization methods. We first describe our experimental setup including the datasets that we use in the evaluation. Next, we present illustrative visualizations over real-world geolocated time series, as well as scalability results against a synthetic dataset containing 4 million geolocated time series.

Conclusions and future work

In this paper, we introduced methods for map-based visual exploration over large geolocated time series data. To that end, we proposed two summarization approaches over geolocated time series, which allow a visual analytics application to retrieve the required information. The results can be displayed on a map, depicting the spatial distribution of the data in the form of MBRs for both approaches. Each approach also provides a time series summary, via time series bundles or tile maps

Acknowledgements

This work was partially funded by the EU H2020 projects SLIPO (731581), SmartDataLake (825041) and the NSRF 2014-2020 project HELIX (5002781).

References (41)

H. Hochheiser et al.
Interactive exploration of time series data
A. Camerra et al.
Beyond one billion time series: indexing and mining very large time series collections with iSAX2+
Knowl. Inf. Syst.
(2014)
H. Ding et al.
Querying and mining of time series data: experimental comparison of representations and distance measures
Proc. VLDB Endow.
(2008)
J. Shieh et al.
iSAX: indexing and mining terabyte sized time series
P. Chronis et al.
Open issues and challenges on time series forecasting for water consumption
K. Chan et al.
Efficient time series matching by wavelets
A. Camerra et al.
iSAX 2.0: indexing and mining one billion time series
K. Zoumpatianos et al.
Indexing for interactive exploration of big data series
G. Chatzigeorgakidis et al.
Indexing geolocated time series data
A. Guttman
R-trees: a dynamic index structure for spatial searching

G. Chatzigeorgakidis et al.

Map-based visual exploration of geolocated time series

A. Graps

An introduction to wavelets

IEEE Comput. Sci. Eng.

(1995)

I. Popivanov et al.

Similarity search over time-series data using wavelets

S. Kashyap et al.

Scalable kNN search on vertically stored time series

J. Lin et al.

Experiencing SAX: a novel symbolic representation of time series

Data Min. Knowl. Discov.

(2007)

E.J. Keogh et al.

Dimensionality reduction for fast similarity search in large time series databases

Knowl. Inf. Syst.

(2001)

B. Yi et al.

Fast time sequence indexing for arbitrary Lp norms

T. Palpanas

Big sequence management: a glimpse of the past, the present, and the future

L. Chen et al.

Spatial keyword query processing: an experimental evaluation

Proc. VLDB Endow.

(2013)

M. Christoforaki et al.

Text vs. space: efficient geo-search query processing

Cited by (8)

A Semantic Approach for Big Data Exploration in Industry 4.0
2021, Big Data Research
Citation Excerpt :
They have been used for querying databases [6], for retrieving data from the Web [14] and also for visual exploration of time series. In this last case, there are approaches that advocate for the use of example-based methods such as [15], and [16], which proposes a multilevel map-based visualizations of geolocated time series. Different proposals can also be found among systems that deal with semantic data, such as SparqlFilterFlow [17], which employs a diagram-based approach to represent the queries, and Rhizomer [18], which employs a form based approach.
The growing trends in automation, Internet of Things, big data and cloud computing technologies have led to the fourth industrial revolution (Industry 4.0), where it is possible to visualize and identify patterns and insights, which results in a better understanding of the data and can improve the manufacturing process. However, many times, the task of data exploration results difficult for manufacturing experts because they might be interested in analyzing also data that does not appear in pre-designed visualizations and therefore they must be assisted by Information Technology experts.
In this paper, we present a proposal materialized in a semantic-based visual query system developed for a real Industry 4.0 scenario that allows domain experts to explore and visualize data in a friendly way. The main novelty of the system is the combined use that it makes of captured data that are semantically annotated first, and a 2D customized digital representation of a machine that is also linked with semantic descriptions. Those descriptions are expressed using terms of an ontology, where, among others, the sensors that are used to capture indicators about the performance of a machine that belongs to a Industry 4.0 scenario have been modeled. Moreover, this semantic description allows to: formulate queries at a higher level of abstraction, provide customized graphical visualizations of the results based on the format and nature of the data, and download enriched data enabling further types of analysis.
A novel optimization approach to topology checking of pipeline vector data in browser side
2024, Computing
Visualizing Large-Scale Spatial Time Series with GeoChron
2024, IEEE Transactions on Visualization and Computer Graphics
Discovering and comparing types of general practitioner practices using geolocational features and prescribing behaviours by means of K-means clustering
2021, Scientific Reports
Matrix Profile Index Approximation for Streaming Time Series
2021, Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021
A Visual Explorer for Geolocated Time Series
2020, GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems

View all citing articles on Scopus

View full text

Visual Exploration of Geolocated Time Series with Hybrid Indexing

Abstract

Introduction

Section snippets

Related work

Problem formulation

Computing bundle summaries

Computing tile map summaries

Experimental evaluation

Conclusions and future work

Acknowledgements

Beyond one billion time series: indexing and mining very large time series collections with iSAX2+

Knowl. Inf. Syst.

Querying and mining of time series data: experimental comparison of representations and distance measures

Proc. VLDB Endow.

iSAX: indexing and mining terabyte sized time series

Open issues and challenges on time series forecasting for water consumption

Efficient time series matching by wavelets

iSAX 2.0: indexing and mining one billion time series

Indexing for interactive exploration of big data series

Indexing geolocated time series data

R-trees: a dynamic index structure for spatial searching

Map-based visual exploration of geolocated time series

An introduction to wavelets

IEEE Comput. Sci. Eng.

Similarity search over time-series data using wavelets

Scalable kNN search on vertically stored time series

Experiencing SAX: a novel symbolic representation of time series

Data Min. Knowl. Discov.

Dimensionality reduction for fast similarity search in large time series databases

Knowl. Inf. Syst.

Fast time sequence indexing for arbitrary Lp norms

Big sequence management: a glimpse of the past, the present, and the future

Spatial keyword query processing: an experimental evaluation

Proc. VLDB Endow.

Text vs. space: efficient geo-search query processing