Spatiotemporal Prediction Based on Feature Classification for Multivariate Floating-Point Time Series Lossy Compression
Introduction
With the frequent utilization of sensors, IoT devices, and smart machines, the amount of measurement data also dramatically increases. Time series data account for a large proportion of these measurement data. Any part of a time series likely affects some downstream applications, in which transmitting and storing massive time series data are not a comfortable task [5], [16]. There is an urgent need for time series compression to reduce the volume of time series data. By compressing and reconstructing time series data, time series data practitioners can analyze and mine these data at any time. Currently, more and more efforts have been made in time series compression [11], which mainly focus on compressing univariate time series, both short and long. Of course, some methods also exist to compress multivariate time series, but these methods mainly deal with variables separately. This way, the variables are independent, and the correlation across variables is not leveraged. Therefore, finding a method to compress MTS especially is an important task.
In addition to using traditional compression methods for MTS processing, there are two advanced methods to compress MTS. These two methods are separately based on sparse dictionary coding [19], [25] and neural networks [2]. To be best of our knowledge, the former is the first to use correlation across variables to compress MTS, and the latter is the first compression model that enables to compression of time series data by a neural networks-based prediction method. The prediction method introduces the linear prediction method and neural networks-based prediction method. Unfortunately, the former compress MTS data only using the spatial correlation between variables, and the latter can only compress univariate time series data.
In several cases, time series data consist of floating-point data, which typically contains measurement noise. Thus, by adequately handling noise, lossy compression can improve the performance of downstream applications [17] without causing considerable losses. Also, compared with lossless compression techniques, lossy compression achieves a higher compression ratio to some extent. Moreover, it is possible to ensure that original data can be reconstructed and accepted by downstream applications within a maximum user-defined error range. Finally, MTS data are highly correlated as they often capture multiple facts about the same phenomenon. As shown in Fig. 1, there is a spatiotemporal correlation across variables. Making good use of this correlation of MTS can improve MTS compression performance. It is crucial to use the spatiotemporal prediction method for MTS data prediction and then realize the multivariate floating-point time series lossy compression.
Note that the state-of-the-art time series prediction methods are unsuitable for the above compression framework. The mainstream time series prediction models are nonlinear autoregressive exogenous and autoregressive. Most of these approaches are either limited to linear univariate time series and unsuitable for MTS data or need to employ predetermined non-linearities. It is not easy to recognize different forms of nonlinear relationships of MTS data appropriately. The methods are based on neural networks [15], [21], [23], [29], [30] and have received significant attention due to their ability to capture nonlinear interdependencies. However, these methods are better suited to machine translation, image captioning, and document classification rather than prediction. This is because neural networks tend to choose related time steps rather than relevant time series or variables to make predictions. Moreover, these methods are mainly designed for classification rather than time series prediction. Traditional prediction methods and methods based on deep learning focus on leveraging the temporal characteristics and often neglect other dynamic relationships of MTS data, such as spatiotemporal correlations. Focusing more on relevant variables and their spatiotemporal relationship is critical for the MTS data prediction.
Our work touches on two concepts: variable correlation and time series prediction. In order to compress MTS and further optimize the compression framework, we take Prediction-Quantization-Entropy [2] as our compression framework. Most of our efforts are focused on the prediction module. In this module, we make good use of spatiotemporal correlation across variables. As a result, the prediction accuracy and compression ratio can be improved. It should be noted that the nonlinear temporal and spatial correlation between different variables over time is very complex. Taking good advantage of complex spatiotemporal correlation across variables and applying this relationship to MTS data prediction and MTS data compression is the core content of our research work. This paper proposes a prediction method based on feature classification. We use Convolution Neural Network (CNN) [27] to extract the temporal correlation across variables within window length. Then we decompose Convolution Matrix obtained by rows. We divide the row of the matrix into two-dimensional tensors to extract spatial correlation and further extract temporal correlation. Based upon the above step, we operate image classification [14], [34], [35] on those tensors to assign weights to each feature. The classification weights on each feature select those variables that are helpful for forecasting. Since the context vector is now the weighted sum of the row vectors containing the information across multiple time steps, it captures spatiotemporal information of different variables. To further improve the robustness, we integrate the AR autoregressive linear model in parallel with the nonlinear neural network. We use CNN to extract the temporal correlation of time series because CNN takes advantage of convolutional layers to discover local time dependencies among variables. CNN has a feature that the area scanned by the convolution kernel shares the parameters in the filter for calculation. This feature can significantly reduce the number of parameters and improve the training time of the model. That is, CNN needs less time-consuming than LSTM. Our work mainly revolves around the spatiotemporal correlation of multivariate time series. For simplicity, we abbreviate our prediction method as STBFC (spatiotemporal prediction based on feature classification) and our compression method as STCmp (spatiotemporal compression). The main contributions of this paper are summarized as follows:
(1) We introduce a new pre-processing method for multivariate time series to classify variables better and extract spatiotemporal correlations across variables.
(2) Instead of selecting appropriate time steps for prediction, we focus on selecting relevant variables by giving higher classification weight to those variables. We introduce image classification algorithms to assign weight to variables according to their spatial correlation. In addition to excluding irrelevant attributes and selecting relevant variables, the algorithms also extract spatial correlations between variables.
(3) We extract the spatiotemporal correlation of MTS to achieve a more accurate prediction of MTS. Compared with other methods, our method's results on real datasets such as Electricity proved to achieve state-of-the-art performances. By applying STBFC to the prediction-quantization-entropy compression framework achieves a more excellent compression ratio than other time series compression methods.
This paper is organized as follows. Section 2 reviews some related work on time series prediction and time series compression. Section 3 briefly presents some preliminaries related to our proposed model. We introduce the MTS pre-processing method and STBFC in Section 4. We validate STBFC with experimental results of the prediction of MTS and apply it to the MTS compression framework in Section 5. Section 6 concludes this paper.
Section snippets
Related work
Our work in this paper is closely related to two lines of work: time series prediction and time series compression.
Preliminaries
The Prediction-Quantization-Entropy framework is used as a basic framework to implement compression in this paper. In addition to the Prediction-Quantization-Entropy framework, we briefly describe Squeeze-and-Excitation Network [14], which has an excellent performance in image classification and is closely related to our proposed prediction method.
Notation and problem statement
Multiple variables are often used to describe a complex system. For the same object, the values of these variables can be obtained from different angles at different times. We use t and n as the total length of time (window length or time step) and the number of variables, respectively. The matrix can be applied as a mathematical representation of MTS, in which denotes the kth variables of time series data and
Experiments
To verify the ability of STBFC for MTS forecasting, we first describe the datasets upon which our experiments are conducted and introduce our evaluation metrics. Then, we present our experimental results and a visualization of the prediction against other methods. Finally, we apply STBFC to the prediction-quantization-entropy framework to compress MTS data. We present the compression results of STCmp compared with other compression methods.
Conclusion
This paper presents a new multivariate time series prediction method. According to the spatiotemporal correlation between variables in MTS, we classify the variables and combine them with their classification weights to achieve prediction. We apply STBFC to the prediction-entropy-quantization framework to compress multivariate time series. A number of real datasets were used to evaluate STBFC. Compared to state-of-the-art techniques, our proposed methods it can improve prediction accuracy and
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (39)
- et al.
Least squares based iterative parameter estimation algorithm for multivariable controlled ARMA system modelling with finite measurement data
Math. Comput. Model.
(2011) - et al.
Multivariate time series forecasting via attention-based encoder-decoder framework
Neurocomputing
(2020) - et al.
Statistical monitoring of nonlinear profiles by using piecewise linear approximation
J. Process Control
(2011) - et al.
DSTP-RNN: a dual-stage two-phase attention-based recurrent neural network for long-term and multivariate time series prediction
Expert Syst. Appl.
(2020) - et al.
Real-time road traffic prediction with spatio-temporal correlations
Transp. Res., Part C, Emerg. Technol.
(2011) - et al.
Systematic evaluation of convolution neural network advances on the imagenet
Comput. Vis. Image Underst.
(2017) - et al.
LFZip: lossy compression of multivariate floating-point time series data via improved prediction
- et al.
Embedding distributions and Chebyshev polynomials
Graphs Comb.
(2012) - et al.
Adaptive predistortion with direct learning based on piecewise linear approximation of amplifier nonlinearity
IEEE J. Sel. Top. Signal Process.
(2009) - et al.
A fast lightweight time-series store for IoT data
Gate-variants of gated recurrent unit (GRU) neural networks
A time-series compression technique and its application to the smart grid
VLDB J.
Greedy function approximation: a gradient boosting machine
Ann. Stat.
Time series compression: a survey
Financial time series forecasting using support vector machine
Partial convolution based padding
Squeeze-and-excitation networks
Long short-term memory network prediction model based on fuzzy time series
Time series management systems: a survey
IEEE Trans. Knowl. Data Eng.
Cited by (4)
Cocv: A compression algorithm for time-series data with continuous constant values in IoT-based monitoring systems
2024, Internet of Things (Netherlands)Multivariate time series collaborative compression for monitoring systems in securing cloud-based digital twin
2024, Journal of Cloud ComputingCompressing Big OLAP Data Cubes in Big Data Analytics Systems: New Paradigms, a Reference Architecture, and Future Research Perspectives
2023, Communications in Computer and Information Science
- 1
Co-first author.