Skip to main content

Multivariate Time Series Representation and Similarity Search Using PCA

  • Conference paper
  • First Online:
Advances in Data Mining. Applications and Theoretical Aspects (ICDM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10357))

Included in the following conference series:

Abstract

Multivariate time series(MTS) data mining has attracted much interest in recent years due to the increasing number of fields requiring the capability to manage and process large collections of MTS. In those frameworks, carrying out pattern recognition tasks such as similarity search, clustering or classification can be challenging due to the high dimensionality, noise, redundancy and feature correlated characteristics of the data. Dimensionality reduction is consequently often used as a preprocessing step to render the data more manageable. We propose in this paper a novel MTS similarity search approach that addresses these problems through dimensionality reduction and correlation analysis. An important contribution of the proposed technique is a representation allowing to transform the MTS with large number of variables to a univariate signal prior to seeking correlations within the set. The technique relies on unsupervised learning through Principal Component Analysis(PCA) to uncover and use, weights associated with the original input variables, in the univariate derivation. We conduct numerous experiments using various benchmark datasets to study the performance of the proposed technique. Compared to major existing techniques, our results indicate increased accuracy and efficiency. We also show that our technique yields improved similarity search accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Asuncion, A., Newman, D.: Uci machine learning repository (2007)

    Google Scholar 

  2. Bankó, Z., Abonyi, J.: Correlation based dynamic time warping of multivariate time series. Expert Systems with Applications 39(17), 12814–12823 (2012)

    Article  Google Scholar 

  3. Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 245–250. ACM (2001)

    Google Scholar 

  4. Camerra, A., Palpanas, T., Shieh, J., Keogh, E.: isax 2.0: Indexing and mining one billion time series. In: 2010 IEEE 10th International Conference on Data Mining (ICDM), pp. 58–67. IEEE (2010)

    Google Scholar 

  5. Draper, B.A., Baek, K., Bartlett, M.S., Beveridge, J.R.: Recognizing faces with pca and ica. Computer vision and image understanding 91(1), 115–137 (2003)

    Article  Google Scholar 

  6. Esmael, B., Arnaout, A., Fruhwirth, R.K., Thonhauser, G.: Multivariate time series classification by combining trend-based and value-based approximations. In: Murgante, B., Gervasi, O., Misra, S., Nedjah, N., Rocha, A.M.A.C., Taniar, D., Apduhan, B.O. (eds.) ICCSA 2012. LNCS, vol. 7336, pp. 392–403. Springer, Heidelberg (2012). doi:10.1007/978-3-642-31128-4_29

    Chapter  Google Scholar 

  7. Fradkin, D., Madigan, D.: Experiments with random projections for machine learning. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 517–522. ACM (2003)

    Google Scholar 

  8. Hyvärinen, A., Oja, E.: Independent component analysis: algorithms and applications. Neural networks 13(4), 411–430 (2000)

    Article  Google Scholar 

  9. Jegou, H., Douze, M., Schmid, C.: Inria holidays dataset (2008)

    Google Scholar 

  10. Johnson, W.B., Lindenstrauss, J.: Extensions of lipschitz mappings into a hilbert space. Contemporary Mathematics 26(189-206), 1 (1984)

    Google Scholar 

  11. Jolliffe, I.: Principal component analysis. Wiley Online Library

    Google Scholar 

  12. Kadous, M.W.: Temporal classification: Extending the classification paradigm to multivariate time series. PhD thesis, The University of New South Wales (2002)

    Google Scholar 

  13. Kahveci, T., Singh, A., Gurel, A.: Similarity searching for multi-attribute sequences. In: Proceedings of 14th International Conference on Scientific and Statistical Database Management, pp. 175–184. IEEE (2002)

    Google Scholar 

  14. Kane, A., Shiri, N.: Selecting the top-k discriminative features using principal component analysis. In: 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), pp. 639–646. IEEE (2016)

    Google Scholar 

  15. Karamitopoulos, L., Evangelidis, G., Dervos, D.: Pca-based time series similarity search. In Data Mining, pp. 255–276. Springer, 2010

    Google Scholar 

  16. Keogh, E.: Ucr time series archive (2006). www.cs.ucr.edu/~eamonn

  17. Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Dimensionality reduction for fast similarity search in large time series databases. Knowledge and information Systems 3(3), 263–286 (2001)

    Article  MATH  Google Scholar 

  18. Lin, J., Keogh, E., Lonardi, S., Chiu, B.: A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 2–11. ACM (2003)

    Google Scholar 

  19. Moinester, M., Gottfriedb, R.: Sample size estimation for correlations with pre-specified confidence interval

    Google Scholar 

  20. Mueen, A., Nath, S., Liu, J.: Fast approximate correlation for massive time-series data. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 171–182. ACM (2010)

    Google Scholar 

  21. Pearson, K.: Mathematical contributions to the theory of evolution. xix. second supplement to a memoir on skew variation. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, pp. 429–457 (1916)

    Google Scholar 

  22. Quandl. http://www.quandl.com/help/api

  23. Rakthanmanon, T., Campana, B., Mueen, A., Batista, G., Westover, B., Zhu, Q., Zakaria, J., Keogh, E.: Searching and mining trillions of time series subsequences under dynamic time warping. pp. 262–270, 2012

    Google Scholar 

  24. Ratanamahatana, C., Keogh, E., Bagnall, A.J., Lonardi, S.: A novel bit level time series representation with implication of similarity search and clustering. In: Ho, T.B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS, vol. 3518, pp. 771–777. Springer, Heidelberg (2005). doi:10.1007/11430919_90

    Chapter  Google Scholar 

  25. Roverso, D.: Plant diagnostics by transient classification: The aladdin approach. International Journal of Intelligent Systems 17(8), 767–790 (2002)

    Google Scholar 

  26. Shieh, J., Keogh, E.: isax: disk-aware mining and indexing of massive time series datasets. In: Data Mining and Knowledge Discovery 19(1), 24–57 (2009)

    Google Scholar 

  27. Tanaka, Y., Iwamoto, K., Uehara, K.: Discovery of time-series motif from multi-dimensional data based on mdl principle. Machine Learning 58(2–3), 269–300 (2005)

    Article  MATH  Google Scholar 

  28. Yang, K., Shahabi, C.: A pca-based similarity measure for multivariate time series. In: Proceedings of the 2nd ACM International Workshop on Multimedia Databases, pp. 65–74. ACM (2004)

    Google Scholar 

  29. Yang, K., Yoon, H., Shahabi, C.: A supervised feature subset selection technique for multivariate time series.yang2004pca (2005)

    Google Scholar 

  30. Yi, B.-K., Faloutsos, C.: Fast time sequence indexing for arbitrary lp norms. VLDB (2000)

    Google Scholar 

  31. Zhu, Y.: High performance data mining in time series: techniques and case studies. PhD thesis, New York University (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aminata Kane .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Kane, A., Shiri, N. (2017). Multivariate Time Series Representation and Similarity Search Using PCA. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2017. Lecture Notes in Computer Science(), vol 10357. Springer, Cham. https://doi.org/10.1007/978-3-319-62701-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-62701-4_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-62700-7

  • Online ISBN: 978-3-319-62701-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics