Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The development of sensor networks has resulted in an explosion of time series data over the last years. These are large multivariate time series, where variables are collected synchronously over time. Thus, pattern mining of multivariate time series is becoming highly relevant, both in scientific research and industrial applications. Note that in the multivariate setting, not only patterns in one variable over time are relevant, but also relationships between multiple variables could provide useful insights. This task can be seen as clustering both time periods and variables, also know as biclustering [8–10].

Given a multivariate time series, it could be useful to try different biclustering algorithms. Also, one needs to optimize parameters across different steps, such as pre-processing, segmentation and biclustering itself. For each of these steps, there are many parameters to be optimized, leading to a large number of experiments. Furthermore, at each step, visual inspection is highly important for researchers to validate their findings. However, there is a lack of tools for this process.

We propose Bipeline, a web-based visualization tool that provides a pipeline for applying biclustering to multivariate time series. This tool is readily accessible to anyone via a web-based interface, allowing them to navigate through multiple experimental settings. Parameters can be interactively tuned, with web components such as checkboxes, sliders and drop-down menus. At each step of the biclustering process, feedback is provided be means of visualizations, with plots such as pre-processed time series, segmentation boundaries and biclusters. One or more biclusters can be plotted with a simple selection procedure.

2 Related Work

Until now, biclustering software tools with a graphical user interface have been developed to deal with biological gene expression data. BicOverlapper [1] is a tool for visual inspection of gene expression biclusters, introducing a novel visualization algorithm Overlapper to represent biclusters. Similarly, BiCluster Viewer [2] is a visualization tool for efficient and interactive analysis of large gene expression datasets. BicAT [3] implements multiple biclustering algorithms, for visualization and analysis of biclusters for expression data. BiGGEsTS [4] provides an environment for biclustering time series gene expression data.

All tools mentioned above integrate techniques for pre-processing and biclustering analysis, specifically for gene expression data. Their main purpose is to support biologists with the analysis and exploration of the gene expression data. However, these tools do not support biclustering analysis for multivariate time series. Also, most of them do not provide a pipeline experiment environment. Bipeline provides such a pipeline, where intermediate results can be inspected and saved. Using a friendly and interactive plotting environment, both non-experts and experts can pre-process, segment and analyze biclusters for multivariate time series.

3 Tool Overview

Bipeline is a web-based application that provides a pipeline to pre-process, segment and bicluster multivariate time series. An online version is available [12], which is compatible with all modern web browsers and across different client platforms. Both the user interface in the web browser and the server are implemented using R Shiny package [5]. In Fig. 1, the system architecture illustrates the experimental pipeline and how each individual step relates to the other steps:

Fig. 1.
figure 1

A overview of Bipeline architecture.

Importing: Users can upload datasets and have a first view of the data table and descriptive statistics (minimum, maximum, mean, ...). This first inspection, although useful, is not enough to assess the quality of the data.

Plotting: To gain further insight into the time series, it is crucial to have a visual inspection of the time series. The plotting panel includes multiple interactive plotting views, using a plotting R package dygraphs [6]. An example of these plots is illustrated in Fig. 2(a). These interactive plots allow zoom in and out functionality, which is a highly desirable functionality for visual inspection of large time series.

Fig. 2.
figure 2

Bipeline user interface. (Color figure online)

Pre-processing: This panel allows preliminary handling of data such as: excluding variables, normalization, conditional removal and replacement of data, and outlier removal. Users can alternate between plotting (Fig. 2(a)), and pre-processing (Fig. 2(b)) until satisfied, then export the pre-processed data by clicking the Save button.

Segmentation: This allows segmentation of the data, one of the steps necessary for the biclustering as suggested by [10]. By default, all variables share the same parameter settings: window size, overlap and threshold can be easily tuned. For greater flexibility, the user can dynamically create new tabs to set the parameters for individual variables. Additionally, a minimum segment size is customizable, and the tool will merge short segments to its most similar contiguous segment. Segmentation results can be visualized (Fig. 2(c)), saved and (re-)loaded, allowing the results to be used during the next step, biclustering.

Biclustering: In Bipeline, we implement a number of biclustering algorithms, group in three categories. The baseline algorithms allow users to try well-known biclustering algorithms (e.g., Cheng & Church) [8, 9], that have been implemented using R package biclust [7]. Segmentation + Baseline biclusters the time series using an average representation of each segment, instead of using individual rows. Segmentation + BiclusTS is a novel algorithm [10] introduced to recognize similarities between segments, using probability density-difference estimation [11]. All biclusters are plotted in colored blocks, as shown in Fig. 2(d). Users can select the biclusters they want to see, and the plot will respond with a real-time update.

Multiple features are shared by both Segmentation and Biclustering. Plots and parameter tables from different experiments are kept in history, allowing users to navigate back and forth to compare results and optimize parameters. During computationally expensive tasks, the front-end displays a progress bar, while the back-end server is busy carrying out the calculations. Furthermore, interactive web components can be saved into images with a single click.

4 Conclusion

We propose Bipeline, a web-based visualization tool, which provides a pipeline for applying biclustering to multivariate time series. Its main features include: visual inspection at multiple stages, interactive zoom in and out plotting, easy navigation, storage of results, and saving plots and experimental settings using a single click. Bipeline’s intuitive web-based design, makes it accessible both to experts and non-experts, and compatible across platforms.