Abstract
A multivariate time series is one of the most important objects of research in data mining. Time and variables are two of its distinctive characteristics that add the complication of the algorithms applied to data mining. Reduction in the dimensionality is often regarded as an effective way to address these issues. In this paper, we propose a method based on principal component analysis (PCA) to effectively reduce the dimensionality. We call it “piecewise representation based on PCA” (PPCA), which segments multivariate time series into several sequences, calculates the covariance matrix for each of them in terms of the variables, and employs PCA to obtain the principal components in an average covariance matrix. The results of the experiments, including retained information analysis, classification, and a comparison of the central processing unit time consumption, demonstrate that the PPCA method used to reduce the dimensionality in multivariate time series is superior to the prior methods.











Similar content being viewed by others
References
Davis RA, Song L (2020) Noncausal vector AR processes with application to economic time series. J Econ 216(1):246–267
Majumdar S, Laha AK (2020) Clustering and classification of time series using topological data analysis with applications to finance. Expert Syst Appl 162(1):113868
Yang D, Dong Z, Lim L et al (2017) Analyzing big time series data in solar engineering using features and pca. Sol Energy 153:317–328
Li H, Wu Y, Zhang S, Zou J (2021) Temporary rules of retail product sales time series based on the matrix profile. J Retail Consum Serv 60:102431
Yen NY, Chang JW, Liao JY et al (2020) Analysis of interpolation algorithms for the missing values in IoT time series: a case of air quality in Taiwan. J Supercomput 76:6475–6500
Müller IM (2021) Feature selection for energy system modeling: identification of relevant time series information. Energy AI 4:100057
Ahn GS, Hur S (2020) Efficient genetic algorithm for feature selection for early time series classification. Comput & Ind Eng 142:106345
Baydogan MG, Runger G (2016) Time series representation and similarity based on local auto patterns. Data Min Knowl Discov 30(2):476–509
Tamanna T, Rahman MA, Sultana A, Haque MH, Parvez MZ (2021) Predicting seizure onset based on time-frequency analysis of EEG signals. Chaos Solitions & Fract 145:110796
Sundarasekar R, Thanjaivadivel M, Manogaran G, Kumar PM, Varatharajan R, Chilamkurti NK, Hsu C (2018) Internet of things with maximal overlap discrete wavelet transform for remote health monitoring of abnormal ECG signals. J Med Syst 42(11):1–13
Albertetti F, Grossrieder L, Ribaux O, Stoffel K (2016) Change points detection in crime-related time series: an on-line fuzzy approach based on a shape space representation. Appl Soft Comput 40:441–454
Gezawa AS, Bello ZA, Wang Q, Lei Y (2021) A voxelized point clouds representation for object classification and segmentation on 3D data. J Supercomput. https://doi.org/10.1007/s11227-021-03899-x
Papadakis SE, Kaburlasos VG (2010) Piecewise-linear approximation of non-linear models based on probabilistically/possibilistically interpreted intervals numbers (INs). Inf Sci 180:5060–5076
Si G, Zheng K, Zhou Z, Pan C, Xiang X, Kai Q, Zhang Y (2018) Three-dimensional piecewise cloud representation for time series data mining. Neurocomputing 316:78–94
Ren H, Liu M, Li Z, Pedrycz W (2017) A piecewise aggregate pattern representation approach for anomaly detection in time series. Knowl Based Syst 135:29–39
Li H (2017) Distance measure with improved lower bound for multivariate time series. Physica A 468:622–637
Fotso VSS, Nguifo EM, Vaslin P (2019) Grasp heuristic for time series compression with piecewise aggregate approximation. RAIRO-Op Res 53:243–259
Emmanuel M, Giraldez J (2019) Net electricity clustering at different temporal resolutions using a sax-base method for integrated distribution system planning. IEEE Access 7:123689–123697
He X, Shao C, Xiong Y (2016) A non-parametric symbolic approximate representation for long time series. Pattern Anal Appl 19(1):111–127
Zhang C, Chen Y, Yin A, Qin Z, Jiang Z (2018) An improvement of PAA on trend-based approximation for time series. In: Proceedings of the 18th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP’18) pp. 248–262
Oang KY, Yang C, Muniyappan S, Kim J, Ihee H (2017) SVD-aided pseudo principal-component analysis: a new method to speed up and improve determination of the optimum kinetic model from time-resolved data. Struct Dyn 4(4):044013
Kousika N, Premalatha K (2021) An improved privacy-preserving data mining technique using singular value decomposition with three-dimensional rotation data perturbation. J Supercomput 77:10003–10011
Granato D, Santos JS, Escher GB (2018) Use of principal component analysis (PCA) and hierarchical cluster analysis (HCA) for multivariate association between bioactive compounds and functional properties in foods: a critical perspective. Trends Food Sci Technol 72:83–90
Yue X, Zhang H (2020) Grasshopper optimization algorithm with principal component analysis for global optimization. J Supercomput 76:5609–5635
Feng L, Zhao C, Huang B (2019) A slow independent component analysis algorithm for time series feature extraction with the concurrent consideration of high-order statistic and slowness. J Process Control 84:1–12
Krzanowski W (1979) Between-groups comparison of principal components. J Acoust Soc Am 74(367):703–707
Singhal A, Seborg DE (2005) Clustering multivariate time-series data. J Chemom 19:427–438
Karamitopoulos L, Evangelidis G, Dervos D (2010) PCA-based time series similarity search. Data Min Ann Inf Systems 8:255–276
Goetschalckx K, Moons B, Wambacq P, Verhelst M (2018) Efficiently combining SVD, pruning, clustering and retraining for enhanced neural network compression. In: Proceedings of the 2nd International Workshop on Embedded and Mobile Deep Learning pp. 1–6
Weng X, Shen J (2008) Classification of multivariate time series using two-dimensional singular value decomposition. Knowl Based Syst 21(7):535–539
Wu E, Yu P (2005) Independent component analysis for clustering multivariate time series data. Adv Data Min Appl 8:474–482
Issoglio E, Smith P, Voss J (2021) On the estimation of entropy in the FastICA algorithm. J Multivar Anal 181:104689
Xian L, He K, Wang C, Lai K (2020) Factor analysis of financial time series using EEMD-ICA based approach. Sustain Futur 2:100003
Xu J, Hugelier S, Zhu H, Gowen AA (2020) Deep learning for classification of time series spectral images using combined multi-temporal and spectral features. Anal Chim Acta 1143:9–20
Li H (2017) Distance measure with improved lower bound for multivariate time series. Phys A Stat Mech Appl 468(1):622–637
Huang Y, Gertler J, McAvoy T (1999) Fault isolation by partial PCA and partial NLPCA. IFAC Proc Vol 32(2):7647–7652
Barragan JF, Fontes CH, Embirucu M (2016) A wavelet-based clustering of multivariate time series using a multiscale SPCA approach. Comput Ind Eng 95:144–155
Li H (2016) Accurate and efficient classification based on common principal components analysis for multivariate time series. Neurocomputing 171:744–753
Johannesmeyer MC (1999) Abnormal situation analysis using pattern recognition techniques and historical data. PhD dissertation, University of California, Santa Barbara
Yang K, Shahabi C (2004) A PCA-based similarity measure for multivariate time series. In: Proceedings of the 2nd ACM International Workshop on Multimedia Databases pp. 65–74
Karamitopoulos L, Evangelidis G, Dervos D (2008) Multivariate time series data mining: PCA-based measures for similarity search. In: Proceedings of the 2008 International Conference in Data Mining pp. 253–259
Li H, Du T (2021) Multivariate time-series clustering based on component relationship networks. Expert Syst Appl 173:114649
Li H (2021) Time works well: dynamic time warping based on time weighting for time series data mining. Inf Sci 547:592–608
Kim H, Kim HK, Kim M, Park J, Cho S, Im KB, Ryu CR (2019) Representation learning for unsupervised heterogeneous multivariate time series segmentation and its application. Comput Ind Eng 130:272–281
Li H (2014) Asynchronism-based principal component analysis for time series data mining. Expert Syst Appl 41:2842–2850
Acknowledgements
This work was supported by the Huaqiao University’s High Level Talent Research Start-up Funding Project(14SKBS205), Ministry of Science & Technology, Taiwan (MOST 109-2511-H-003-049-MY3), Social Science Planning Project, Fujian(FJ2020B088) and National Natural Science Foundation of China (71771094).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wan, X., Li, H., Zhang, L. et al. Dimensionality reduction for multivariate time-series data mining. J Supercomput 78, 9862–9878 (2022). https://doi.org/10.1007/s11227-021-04303-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-021-04303-4
Keywords
Profiles
- Yenchun Jim Wu View author profile