Skip to main content
Log in

Dimensionality reduction for multivariate time-series data mining

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

A multivariate time series is one of the most important objects of research in data mining. Time and variables are two of its distinctive characteristics that add the complication of the algorithms applied to data mining. Reduction in the dimensionality is often regarded as an effective way to address these issues. In this paper, we propose a method based on principal component analysis (PCA) to effectively reduce the dimensionality. We call it “piecewise representation based on PCA” (PPCA), which segments multivariate time series into several sequences, calculates the covariance matrix for each of them in terms of the variables, and employs PCA to obtain the principal components in an average covariance matrix. The results of the experiments, including retained information analysis, classification, and a comparison of the central processing unit time consumption, demonstrate that the PPCA method used to reduce the dimensionality in multivariate time series is superior to the prior methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Davis RA, Song L (2020) Noncausal vector AR processes with application to economic time series. J Econ 216(1):246–267

    Article  MathSciNet  Google Scholar 

  2. Majumdar S, Laha AK (2020) Clustering and classification of time series using topological data analysis with applications to finance. Expert Syst Appl 162(1):113868

    Article  Google Scholar 

  3. Yang D, Dong Z, Lim L et al (2017) Analyzing big time series data in solar engineering using features and pca. Sol Energy 153:317–328

    Article  Google Scholar 

  4. Li H, Wu Y, Zhang S, Zou J (2021) Temporary rules of retail product sales time series based on the matrix profile. J Retail Consum Serv 60:102431

    Article  Google Scholar 

  5. Yen NY, Chang JW, Liao JY et al (2020) Analysis of interpolation algorithms for the missing values in IoT time series: a case of air quality in Taiwan. J Supercomput 76:6475–6500

    Article  Google Scholar 

  6. Müller IM (2021) Feature selection for energy system modeling: identification of relevant time series information. Energy AI 4:100057

    Article  Google Scholar 

  7. Ahn GS, Hur S (2020) Efficient genetic algorithm for feature selection for early time series classification. Comput & Ind Eng 142:106345

    Article  Google Scholar 

  8. Baydogan MG, Runger G (2016) Time series representation and similarity based on local auto patterns. Data Min Knowl Discov 30(2):476–509

    Article  MathSciNet  Google Scholar 

  9. Tamanna T, Rahman MA, Sultana A, Haque MH, Parvez MZ (2021) Predicting seizure onset based on time-frequency analysis of EEG signals. Chaos Solitions & Fract 145:110796

    Article  Google Scholar 

  10. Sundarasekar R, Thanjaivadivel M, Manogaran G, Kumar PM, Varatharajan R, Chilamkurti NK, Hsu C (2018) Internet of things with maximal overlap discrete wavelet transform for remote health monitoring of abnormal ECG signals. J Med Syst 42(11):1–13

    Article  Google Scholar 

  11. Albertetti F, Grossrieder L, Ribaux O, Stoffel K (2016) Change points detection in crime-related time series: an on-line fuzzy approach based on a shape space representation. Appl Soft Comput 40:441–454

    Article  Google Scholar 

  12. Gezawa AS, Bello ZA, Wang Q, Lei Y (2021) A voxelized point clouds representation for object classification and segmentation on 3D data. J Supercomput. https://doi.org/10.1007/s11227-021-03899-x

    Article  Google Scholar 

  13. Papadakis SE, Kaburlasos VG (2010) Piecewise-linear approximation of non-linear models based on probabilistically/possibilistically interpreted intervals numbers (INs). Inf Sci 180:5060–5076

    Article  Google Scholar 

  14. Si G, Zheng K, Zhou Z, Pan C, Xiang X, Kai Q, Zhang Y (2018) Three-dimensional piecewise cloud representation for time series data mining. Neurocomputing 316:78–94

    Article  Google Scholar 

  15. Ren H, Liu M, Li Z, Pedrycz W (2017) A piecewise aggregate pattern representation approach for anomaly detection in time series. Knowl Based Syst 135:29–39

    Article  Google Scholar 

  16. Li H (2017) Distance measure with improved lower bound for multivariate time series. Physica A 468:622–637

    Article  MathSciNet  Google Scholar 

  17. Fotso VSS, Nguifo EM, Vaslin P (2019) Grasp heuristic for time series compression with piecewise aggregate approximation. RAIRO-Op Res 53:243–259

    Article  MathSciNet  Google Scholar 

  18. Emmanuel M, Giraldez J (2019) Net electricity clustering at different temporal resolutions using a sax-base method for integrated distribution system planning. IEEE Access 7:123689–123697

    Article  Google Scholar 

  19. He X, Shao C, Xiong Y (2016) A non-parametric symbolic approximate representation for long time series. Pattern Anal Appl 19(1):111–127

    Article  MathSciNet  Google Scholar 

  20. Zhang C, Chen Y, Yin A, Qin Z, Jiang Z (2018) An improvement of PAA on trend-based approximation for time series. In: Proceedings of the 18th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP’18) pp. 248–262

  21. Oang KY, Yang C, Muniyappan S, Kim J, Ihee H (2017) SVD-aided pseudo principal-component analysis: a new method to speed up and improve determination of the optimum kinetic model from time-resolved data. Struct Dyn 4(4):044013

    Article  Google Scholar 

  22. Kousika N, Premalatha K (2021) An improved privacy-preserving data mining technique using singular value decomposition with three-dimensional rotation data perturbation. J Supercomput 77:10003–10011

    Article  Google Scholar 

  23. Granato D, Santos JS, Escher GB (2018) Use of principal component analysis (PCA) and hierarchical cluster analysis (HCA) for multivariate association between bioactive compounds and functional properties in foods: a critical perspective. Trends Food Sci Technol 72:83–90

    Article  Google Scholar 

  24. Yue X, Zhang H (2020) Grasshopper optimization algorithm with principal component analysis for global optimization. J Supercomput 76:5609–5635

    Article  Google Scholar 

  25. Feng L, Zhao C, Huang B (2019) A slow independent component analysis algorithm for time series feature extraction with the concurrent consideration of high-order statistic and slowness. J Process Control 84:1–12

    Article  Google Scholar 

  26. Krzanowski W (1979) Between-groups comparison of principal components. J Acoust Soc Am 74(367):703–707

    MathSciNet  MATH  Google Scholar 

  27. Singhal A, Seborg DE (2005) Clustering multivariate time-series data. J Chemom 19:427–438

    Article  Google Scholar 

  28. Karamitopoulos L, Evangelidis G, Dervos D (2010) PCA-based time series similarity search. Data Min Ann Inf Systems 8:255–276

    Google Scholar 

  29. Goetschalckx K, Moons B, Wambacq P, Verhelst M (2018) Efficiently combining SVD, pruning, clustering and retraining for enhanced neural network compression. In: Proceedings of the 2nd International Workshop on Embedded and Mobile Deep Learning pp. 1–6

  30. Weng X, Shen J (2008) Classification of multivariate time series using two-dimensional singular value decomposition. Knowl Based Syst 21(7):535–539

    Article  Google Scholar 

  31. Wu E, Yu P (2005) Independent component analysis for clustering multivariate time series data. Adv Data Min Appl 8:474–482

    Google Scholar 

  32. Issoglio E, Smith P, Voss J (2021) On the estimation of entropy in the FastICA algorithm. J Multivar Anal 181:104689

    Article  MathSciNet  Google Scholar 

  33. Xian L, He K, Wang C, Lai K (2020) Factor analysis of financial time series using EEMD-ICA based approach. Sustain Futur 2:100003

    Article  Google Scholar 

  34. Xu J, Hugelier S, Zhu H, Gowen AA (2020) Deep learning for classification of time series spectral images using combined multi-temporal and spectral features. Anal Chim Acta 1143:9–20

    Article  Google Scholar 

  35. Li H (2017) Distance measure with improved lower bound for multivariate time series. Phys A Stat Mech Appl 468(1):622–637

    Article  MathSciNet  Google Scholar 

  36. Huang Y, Gertler J, McAvoy T (1999) Fault isolation by partial PCA and partial NLPCA. IFAC Proc Vol 32(2):7647–7652

    Article  Google Scholar 

  37. Barragan JF, Fontes CH, Embirucu M (2016) A wavelet-based clustering of multivariate time series using a multiscale SPCA approach. Comput Ind Eng 95:144–155

    Article  Google Scholar 

  38. Li H (2016) Accurate and efficient classification based on common principal components analysis for multivariate time series. Neurocomputing 171:744–753

    Article  Google Scholar 

  39. Johannesmeyer MC (1999) Abnormal situation analysis using pattern recognition techniques and historical data. PhD dissertation, University of California, Santa Barbara

  40. Yang K, Shahabi C (2004) A PCA-based similarity measure for multivariate time series. In: Proceedings of the 2nd ACM International Workshop on Multimedia Databases pp. 65–74

  41. Karamitopoulos L, Evangelidis G, Dervos D (2008) Multivariate time series data mining: PCA-based measures for similarity search. In: Proceedings of the 2008 International Conference in Data Mining pp. 253–259

  42. Li H, Du T (2021) Multivariate time-series clustering based on component relationship networks. Expert Syst Appl 173:114649

    Article  Google Scholar 

  43. Li H (2021) Time works well: dynamic time warping based on time weighting for time series data mining. Inf Sci 547:592–608

    Article  MathSciNet  Google Scholar 

  44. Kim H, Kim HK, Kim M, Park J, Cho S, Im KB, Ryu CR (2019) Representation learning for unsupervised heterogeneous multivariate time series segmentation and its application. Comput Ind Eng 130:272–281

    Article  Google Scholar 

  45. Li H (2014) Asynchronism-based principal component analysis for time series data mining. Expert Syst Appl 41:2842–2850

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the Huaqiao University’s High Level Talent Research Start-up Funding Project(14SKBS205), Ministry of Science & Technology, Taiwan (MOST 109-2511-H-003-049-MY3), Social Science Planning Project, Fujian(FJ2020B088) and National Natural Science Foundation of China (71771094).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yenchun Jim Wu.

Ethics declarations

Conflict of interests

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wan, X., Li, H., Zhang, L. et al. Dimensionality reduction for multivariate time-series data mining. J Supercomput 78, 9862–9878 (2022). https://doi.org/10.1007/s11227-021-04303-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-021-04303-4

Keywords

Navigation