Elsevier

Big Data Research

Volume 32, 28 May 2023, 100377
Big Data Research

Spatiotemporal Prediction Based on Feature Classification for Multivariate Floating-Point Time Series Lossy Compression

https://doi.org/10.1016/j.bdr.2023.100377Get rights and content

Abstract

A large amount of time series is produced because of the frequent use of IoT devices and sensors. Time series compression is widely adopted to reduce storage overhead and transport costs. At present, most state-of-the-art approaches focus on univariate time series. Therefore, the task of compressing multivariate time series (MTS) is still an important but challenging problem. Traditional MTS compression methods treat each variable individually, ignoring the correlations across variables. This paper proposes a novel MTS prediction method, which can be applied to compress MTS to achieve a higher compression ratio. The method can extract the spatial and temporal correlation across multiple variables, achieving a more accurate prediction and improving the lossy compression performance of MTS based on the prediction-quantization-entropy framework. We use a convolutional neural network (CNN) to extract the temporal features of all variables within the window length. Then the features generated by CNN are transformed, and the image classification algorithm extracts the spatial features of the transformed data. Predictions are made according to spatiotemporal characteristics. To enhance the robustness of our model, we integrate the AR autoregressive linear model in parallel with the proposed network. Experimental results demonstrate that our work can improve the prediction accuracy of MTS and the MTS compression performance in most cases.

Introduction

With the frequent utilization of sensors, IoT devices, and smart machines, the amount of measurement data also dramatically increases. Time series data account for a large proportion of these measurement data. Any part of a time series likely affects some downstream applications, in which transmitting and storing massive time series data are not a comfortable task [5], [16]. There is an urgent need for time series compression to reduce the volume of time series data. By compressing and reconstructing time series data, time series data practitioners can analyze and mine these data at any time. Currently, more and more efforts have been made in time series compression [11], which mainly focus on compressing univariate time series, both short and long. Of course, some methods also exist to compress multivariate time series, but these methods mainly deal with variables separately. This way, the variables are independent, and the correlation across variables is not leveraged. Therefore, finding a method to compress MTS especially is an important task.

In addition to using traditional compression methods for MTS processing, there are two advanced methods to compress MTS. These two methods are separately based on sparse dictionary coding [19], [25] and neural networks [2]. To be best of our knowledge, the former is the first to use correlation across variables to compress MTS, and the latter is the first compression model that enables to compression of time series data by a neural networks-based prediction method. The prediction method introduces the linear prediction method and neural networks-based prediction method. Unfortunately, the former compress MTS data only using the spatial correlation between variables, and the latter can only compress univariate time series data.

In several cases, time series data consist of floating-point data, which typically contains measurement noise. Thus, by adequately handling noise, lossy compression can improve the performance of downstream applications [17] without causing considerable losses. Also, compared with lossless compression techniques, lossy compression achieves a higher compression ratio to some extent. Moreover, it is possible to ensure that original data can be reconstructed and accepted by downstream applications within a maximum user-defined error range. Finally, MTS data are highly correlated as they often capture multiple facts about the same phenomenon. As shown in Fig. 1, there is a spatiotemporal correlation across variables. Making good use of this correlation of MTS can improve MTS compression performance. It is crucial to use the spatiotemporal prediction method for MTS data prediction and then realize the multivariate floating-point time series lossy compression.

Note that the state-of-the-art time series prediction methods are unsuitable for the above compression framework. The mainstream time series prediction models are nonlinear autoregressive exogenous and autoregressive. Most of these approaches are either limited to linear univariate time series and unsuitable for MTS data or need to employ predetermined non-linearities. It is not easy to recognize different forms of nonlinear relationships of MTS data appropriately. The methods are based on neural networks [15], [21], [23], [29], [30] and have received significant attention due to their ability to capture nonlinear interdependencies. However, these methods are better suited to machine translation, image captioning, and document classification rather than prediction. This is because neural networks tend to choose related time steps rather than relevant time series or variables to make predictions. Moreover, these methods are mainly designed for classification rather than time series prediction. Traditional prediction methods and methods based on deep learning focus on leveraging the temporal characteristics and often neglect other dynamic relationships of MTS data, such as spatiotemporal correlations. Focusing more on relevant variables and their spatiotemporal relationship is critical for the MTS data prediction.

Our work touches on two concepts: variable correlation and time series prediction. In order to compress MTS and further optimize the compression framework, we take Prediction-Quantization-Entropy [2] as our compression framework. Most of our efforts are focused on the prediction module. In this module, we make good use of spatiotemporal correlation across variables. As a result, the prediction accuracy and compression ratio can be improved. It should be noted that the nonlinear temporal and spatial correlation between different variables over time is very complex. Taking good advantage of complex spatiotemporal correlation across variables and applying this relationship to MTS data prediction and MTS data compression is the core content of our research work. This paper proposes a prediction method based on feature classification. We use Convolution Neural Network (CNN) [27] to extract the temporal correlation across variables within window length. Then we decompose Convolution Matrix obtained by rows. We divide the row of the matrix into two-dimensional tensors to extract spatial correlation and further extract temporal correlation. Based upon the above step, we operate image classification [14], [34], [35] on those tensors to assign weights to each feature. The classification weights on each feature select those variables that are helpful for forecasting. Since the context vector is now the weighted sum of the row vectors containing the information across multiple time steps, it captures spatiotemporal information of different variables. To further improve the robustness, we integrate the AR autoregressive linear model in parallel with the nonlinear neural network. We use CNN to extract the temporal correlation of time series because CNN takes advantage of convolutional layers to discover local time dependencies among variables. CNN has a feature that the area scanned by the convolution kernel shares the parameters in the filter for calculation. This feature can significantly reduce the number of parameters and improve the training time of the model. That is, CNN needs less time-consuming than LSTM. Our work mainly revolves around the spatiotemporal correlation of multivariate time series. For simplicity, we abbreviate our prediction method as STBFC (spatiotemporal prediction based on feature classification) and our compression method as STCmp (spatiotemporal compression). The main contributions of this paper are summarized as follows:

(1) We introduce a new pre-processing method for multivariate time series to classify variables better and extract spatiotemporal correlations across variables.

(2) Instead of selecting appropriate time steps for prediction, we focus on selecting relevant variables by giving higher classification weight to those variables. We introduce image classification algorithms to assign weight to variables according to their spatial correlation. In addition to excluding irrelevant attributes and selecting relevant variables, the algorithms also extract spatial correlations between variables.

(3) We extract the spatiotemporal correlation of MTS to achieve a more accurate prediction of MTS. Compared with other methods, our method's results on real datasets such as Electricity proved to achieve state-of-the-art performances. By applying STBFC to the prediction-quantization-entropy compression framework achieves a more excellent compression ratio than other time series compression methods.

This paper is organized as follows. Section 2 reviews some related work on time series prediction and time series compression. Section 3 briefly presents some preliminaries related to our proposed model. We introduce the MTS pre-processing method and STBFC in Section 4. We validate STBFC with experimental results of the prediction of MTS and apply it to the MTS compression framework in Section 5. Section 6 concludes this paper.

Section snippets

Related work

Our work in this paper is closely related to two lines of work: time series prediction and time series compression.

Preliminaries

The Prediction-Quantization-Entropy framework is used as a basic framework to implement compression in this paper. In addition to the Prediction-Quantization-Entropy framework, we briefly describe Squeeze-and-Excitation Network [14], which has an excellent performance in image classification and is closely related to our proposed prediction method.

Notation and problem statement

Multiple variables are often used to describe a complex system. For the same object, the values of these variables can be obtained from different angles at different times. We use t and n as the total length of time (window length or time step) and the number of variables, respectively. The matrix X=(x1,x2,,xn)=(x1,x2,,xt)TRn×t can be applied as a mathematical representation of MTS, in which xk=(x1k,x2k,,xtk)TRt denotes the kth variables of time series data and xt=(xt1,xt2,,xtn)Rn

Experiments

To verify the ability of STBFC for MTS forecasting, we first describe the datasets upon which our experiments are conducted and introduce our evaluation metrics. Then, we present our experimental results and a visualization of the prediction against other methods. Finally, we apply STBFC to the prediction-quantization-entropy framework to compress MTS data. We present the compression results of STCmp compared with other compression methods.

Conclusion

This paper presents a new multivariate time series prediction method. According to the spatiotemporal correlation between variables in MTS, we classify the variables and combine them with their classification weights to achieve prediction. We apply STBFC to the prediction-entropy-quantization framework to compress multivariate time series. A number of real datasets were used to evaluate STBFC. Compared to state-of-the-art techniques, our proposed methods it can improve prediction accuracy and

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (39)

  • Rahul Dey et al.

    Gate-variants of gated recurrent unit (GRU) neural networks

  • Frank Eichinger et al.

    A time-series compression technique and its application to the smart grid

    VLDB J.

    (2015)
  • J.H. Friedman

    Greedy function approximation: a gradient boosting machine

    Ann. Stat.

    (2001)
  • G. Chiarot et al.

    Time series compression: a survey

  • B. Gui et al.

    Financial time series forecasting using support vector machine

  • Guilin Liu et al.

    Partial convolution based padding

  • J. Hu et al.

    Squeeze-and-excitation networks

  • Hua Qu et al.

    Long short-term memory network prediction model based on fuzzy time series

  • Søren Kejser Jensen et al.

    Time series management systems: a survey

    IEEE Trans. Knowl. Data Eng.

    (2017)
  • Cited by (4)

    1

    Co-first author.

    View full text