Abstract
With contemporary data collection capacity, data sets containing large numbers of different multivariate time series relating to a common entity (e.g., fMRI, financial stocks) are becoming more prevalent. One pervasive question is whether or not there are patterns or groups of series within the larger data set (e.g., disease patterns in brain scans, mining stocks may be internally similar but themselves may be distinct from banking stocks). There is a relatively large body of literature centered on clustering methods for univariate and multivariate time series, though most do not utilize the time dependencies inherent to time series. This paper develops an exploratory data methodology which in addition to the time dependencies, utilizes the dependency information between S series themselves as well as the dependency information between p variables within the series simultaneously while still retaining the distinctiveness of the two types of variables. This is achieved by combining the principles of both canonical correlation analysis and principal component analysis for time series to obtain a new type of covariance/correlation matrix for a principal component analysis to produce a so-called “principal component time series”. The results are illustrated on two data sets.





Similar content being viewed by others
References
Anderson TW (2003) An introduction to multivariate statistical analysis, 3rd edn. Wiley, New York
Beran J, Mazzola G (1999) Visualizing the relationship between time series by hierarchical smoothing models. J Comput Graph Stat 8:213–228
Bogué R, Smilde AK (1999) Monitoring and diagnosing batch processes with multiway covariates regression models. Am Inst Chem Eng J 45:1504–1520
Box GEP, Jenkins GM, Reinsel GC (2011) Time series analysis: forecasting and control, 4th edn. Wiley, New York
Box GEP, Tiao GC (1977) A canonical analysis of multiple time series. Biometrika 64:355–365
Bro R (2006) Review on multiway analysis in chemistry: 2000–2005. Crit Rev Anal Chem 36:279–293
Bro R, Sidiropoulos ND, Smilde AK (2002) Maximum likelihood fitting using ordinary least squares algorithms. J Chemom 16:387–400
Devlin SJ, Gnanadesikan R, Kettenring JR (1975) Robust estimation and outlier detection with correlation coefficients. Biometrika 62:531–545
Engle RF, Granger CWJ (1987) Co-integration and error-correction: representation, estimation and testing. Econometrica 55:251–276
Goutte C, Toft P, Rostrup E (1999) On clustering fMRI time series. Neuroimage 9:298–310
Harrison L, Penny WD, Friston K (2003) Multivariate autoregressive modeling of fMRI time series. Neuroimage 19:1477–1491
Higham N (2001) Computing the nearest correlation matrix a problem from finance. IMA J Numer Anal 22:329–343
Ho MR, Ombao H, Shumway R (2005) A state-space approach to modelling brain dynamics. Stat Sin 15:407–428
Hotelling H (1936) Relations between two sets of variates. Biometrika 28:321–377
Huzurbazar S, Humphrey NF (2008) Functional clustering of time series: an insight into length scales in subglacial water flow. Water Resour Res 44:W11420
Jäckel P (2002) Monte Carlo methods in finance. Wiley, New York
Johnson RA, Wichern DW (2007) Applied multivariate statistical analysis, 7th edn. Prentice Hall, New Jersey
Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, New York
Jones RH (1964) Prediction of multivariate time series. J Appl Meteorol 3:285–289
Kadous MW (1995) Recognition of Australian sign language using instrumented gloves. Thesis University of South Wales
Kadous MW (1999) Learning comprehensible descriptions and multivariate time series. In: Bratko I, Dzeroski S (eds) Proceedings of the sixteenth international conference on machine learning. Morgan Kaufmann Publishers, San Fransisco, pp 454–463
Kakizawa Y, Shumway RH, Taniguchi N (1998) Discrimination and clustering for mulitvariate time series. J Am Stat Assoc 93:328–340
Kalpakis K, Gada D, Puttagunta V (2001) Distance measures for effective clustering of ARIMA time-series. In: Cercone N, Lin TY, Wu X (eds) Proceedings IEEE international conference on data mining. IEEE, San Jose, pp 273–280
Košmelj K, Batagelj V (1990) Cross-sectional approach for clustering time varying data. J Classif 7:99–109
Košmelj K, Zabkar V (2008) A methodology for identifying time-trend patterns: an application to the advertising expenditure of 28 European countries in the 1994–2004 period. In: Furbach U (ed) Lecture notes in computer science, KI: advances in artificial inteligence. Springer, Berlin, pp 92–106
Kroonenberg PM (2008) Applied multiway data analysis. Wiley, Hoboken
Kroonenberg PM, Harshman RA, Murakami T (2009) Analysing three-way profile data using the PARAFAC and Tucker3 models illustrated with views on parenting. Appl Multivar Res 13:5–41
Kupiec PH (1998) Stress testing in a value at risk framework. J Deriv 6:724
Liao TW (2007) A clustering procedure for exploratory mining of vector time series. Pattern Recogn 40:2550–2562
Liao TW (2005) Clustering of time series: a survey. Pattern Recogn 38:1857–1874
Min W, Tsay RS (2005) On canonical analysis of multivariate time series. Stat Sin 15:303–323
Owsley LMD, Atlas LE, Bernard GD (1997) Self-organizing feature maps and hidden Markov models for machine-tool monitoring. IEEE Trans Signal Process 45:2787–2798
Piccolo D (1990) A distance measure for classifying ARIMA models. J Time Ser Anal 11:153–164
Policker S, Geva AB (2000) Nonstationary time series analysis by temporal clustering. IEEE Trans Syst Man Cybern-B: Cybern 30:339–343
Rapisarda F, Brigo D, Mercurio F (2007) Parameterizing correlations: a geometric interpretation. IMA J Manag Math 18:55–73
Rebonato R, Jäckel P (1999) The most general methodology to create a valid correlation matrix for risk management and option pricing purposes. J Risk 2:17–28
Robinson PM (1973) Generalized canonical analysis for time series. J Multivar Anal 3:141–160
Rousseeuw P, Molenberghs G (1993) Transformation of non positive semidefnite correlation matrices. Commun Stat Theory Methods 22:965–984
Shumway RH (2003) Time-frequency clustering and discriminant analysis. Stat Probab Lett 63:307–314
Simonian J (2010) The most simple methodology to create a valid correlation matrix for risk management and option pricing purposes. Appl Econ Lett 17:1767–1768
Smilde A, Bro R, Geladi P (2004) Multi-way analysis: applications in the chemical sciences. Wiley, Chichester
Tiao GC, Tsay RS (1989) Model specification in multivariate time series. J R Stat Soc Ser B 51:157–213 (\({\bf with discussion}\))
Tsay RS, Tiao GC (1985) Use of canonical analysis in time series model identification. Biometrika 72:299–315
Whittle P (1963) On the fitting of multivariate autoregressions, and the approximate canonical factorization of a spectral density matrix. Biometrika 50:129–134
Wismüller A, Lange O, Dersch DR, Leinsinger GL, Hahn K, Pütz B, Auer D (2002) Cluster analysis of biomedical image time series. Int J Comput Vis 46:103–128
Yin X (2004) Canonical correlation analysis based on information theory. J Multivar Anal 91:161–176
Acknowledgments
The authors are grateful to anonymous referees and the editor for helpful suggestions which considerably improved the text.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Samadi, S.Y., Billard, L., Meshkani, M.R. et al. Canonical correlation for principal components of time series. Comput Stat 32, 1191–1212 (2017). https://doi.org/10.1007/s00180-016-0667-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-016-0667-1