Abstract
Interval time series occur when real intervals of some variable of interest are registered as an ordered sequence along time. We address the problem of clustering interval time series (ITS), for which different approaches are proposed. First, clustering is performed based on point-to-point comparisons. Time-domain and wavelet features also serve as clustering variables in alternative approaches. Furthermore, autocorrelation matrix functions, gathering the autocorrelation and cross-correlation functions of the ITS upper and lower bounds, may be compared using adequate distances (e.g. the Frobenius distance) and used for clustering ITS. An improved procedure to determine the autocorrelation function of ITS is proposed, which also serves as a basis for clustering. The different alternative approaches are explored and their performances compared for ITS simulated under different setups. An application to sea level daily ranges, observed at different locations in Australia, illustrates the proposed methods.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
In what follows we omit the index t, for simplifying the notation.
References
Antunes, A.M.C., Subba Rao, T.: On hypotheses testing for the selection of spatio-temporal models. J. Time Ser. Anal. 27, 767–791 (2006)
Arroyo, J.: Métodos de Predicción para Series Temporales de Intervalos e Histogramas. PhD thesis, Universidad Pontificia Comillas, Madrid (2008)
Arroyo, J., Maté, C.: Forecasting histogram time series with k-nearest neighbours methods. Int. J. Forecast. 25(1), 192–207 (2009)
Bertrand, P., Goupil, F.: Descriptive statistics for symbolic data. In: Bock, H.-H., Diday, E. (eds.) Analysis of Symbolic Data, pp. 106–124. Exploratory Methods for Extracting Statistical Information from Complex Data, Springer, Heidelberg (2000)
Billard, L.: Sample covariance functions for complex quantitative data. In: Proceedings of the World IASC Conference, Yokohama, Japan, pp. 157–163 (2008)
Billard, L., Diday, E.: From the statistics of data to the statistics of knowledge: symbolic data analysis. J. Am. Stat. Assoc. 98(462), 470–487 (2003)
Billard, L., Diday, E.: Symbolic Data Analysis: Conceptual Statistics and Data Mining. Wiley, Chichester (2006)
Brito, P.: Symbolic data analysis: another look at the interaction of data mining and statistics. WIREs Data Min. Knowl. Discov. 4(4), 281–295 (2014)
Caldwell, P.C., Merrifield, M.A., Thompson, P.R.: Sea level measured by tide gauges from global oceans–the joint archive for sea level holdings (NCEI Accession 0019568), Version 5.5. In: NOAA National Centers for Environmental Information, Dataset (2015). https://doi.org/10.7289/V5V40S7W
Caiado, J., Maharaj, E.A., D’Urso, P.: Time series clustering. In: Hennig, C., Meila, M., Murtagh, F., Rocci, R. (eds.) Handbook of Cluster Analysis. Chapman and Hall, New York (2015)
Chavent, M., Lechevallier, Y.: Dynamical clustering of interval data: optimization of an adequacy criterion based on Hausdorff distance. In: Classification, Clustering, and Data Analysis, pp. 53–60. Springer, Berlin (2002)
Cliff, A.D., Ord, J.K.: Model building and the analysis of spatial pattern in human geography. J. R. Stat. Soc. B 37, 297–328 (1975)
Crespo, F., Peters, G., Weber, R.: Rough clustering approaches for dynamic environments. In: Peters, G., Lingras, P., Ślȩzak, D., Yao, Y. (eds.) Rough Sets: Selected Methods and Applications in Management and Engineering. Advanced Information and Knowledge Processing. Springer, London (2012)
Cressie, N.A.C.: Statistics for Spatial Data. Wiley, New York (1993)
Cressie, N.A.C., Wikle, C.K.: Statistics for Spatio-temporal Data. Wiley, Hoboken (2011)
De Carvalho, F.A.T., Lechevallier, Y.: Partitional clustering algorithms for symbolic interval data based on single adaptive distances. Pattern Recognit. 42(7), 1223–1236 (2009)
De Carvalho, F.A.T., Brito, P., Bock, H.-H.: Dynamic clustering for interval data based on \(L_2\) distance. Comput. Stat. 21(2), 231–250 (2006a)
De Carvalho, F.A.T., De Souza, R.M.C.R., Chavent, M., Lechevallier, Y.: Adaptive Hausdorff distances and dynamic clustering of symbolic interval data. Pattern Recognit. Lett. 27(3), 167–179 (2006b)
De Carvalho, F.A.T., Lechevallier, Y., Verde R.: Clustering methods in symbolic data analysis. In: Diday, E., Noirhomme-Fraiture, M. (eds) Symbolic Data Analysis and the SODAS Software, Chichester, pp. 182–203 (2008)
De Souza, R.M.C.R., De Carvalho, F.A.T.: Clustering of interval data based on city-block distances. Pattern Recognit. Lett. 25(3), 353–365 (2004)
Dias, S., Brito, P.: Off the beaten track: a new linear model for interval data. Eur. J. Oper. Res. 258(3), 1118–1130 (2017)
Diday, E., Simon, J.C.: Clustering Analysis. Digital Pattern Recognition, pp. 47–94. Springer, Berlin (1976)
Diggle, P.J., Ribeiro Jr., P.J.: Model-Based Geostatistics. Springer, New York (2007)
Douzal-Chouakria, A., Billard, L., Diday, E.: Principal component analysis for interval-valued observations. Stat. Anal. Data Min. 4(2), 229–246 (2011)
Duarte Silva, A.P., Brito, P.: Linear discriminant analysis for interval data. Comput. Stat. 21(2), 289–308 (2006)
Duarte Silva, A.P., Brito, P.: Discriminant analysis of interval data: an assessment of parametric and distance-based approaches. J. Classif. 32(3), 516–541 (2015)
D’Urso, P., Maharaj, E.A.: Autocorrelation-based fuzzy clustering of time series. Fuzzy Sets Syst. 160, 3565–3589 (2009)
D’Urso, P., Maharaj, E.A.: Wavelets-based clustering of multivariate time series. Fuzzy Sets Syst. 193, 33–61 (2012)
Finkenstadt, B., Held, L., Isham, V. (eds).: Statistical Methods for Spatio-Temporal Systems. Chapman and Hall, London (2007)
García-Ascanio, C., Maté, C.: Electric power demand forecasting using interval time series: a comparison between var and imlp. Energy Policy 38(2), 715–725 (2010)
Genolini, C., Falissard, B.: Kml: k-means for longitudinal data. Comput. Stat. 25, 317–328 (2010)
González-Rivera, G., Arroyo, J.: Time series modeling of histogram-valued data: the daily histogram time series of s&p500 intradaily returns. Int. J. Forecast. 28(1), 20–33 (2012)
Han, A., Yongmiao, H., La, K.K., Shouyang, W.: Interval time series analysis with an application to the sterling-dollar exchange rate. J. Syst. Sci. Complex. 21(4), 558–573 (2008)
Han, A., Hong, Y., Wang, S.: Autoregressive conditional models for interval-valued time series data. In: The 3rd International Conference on Singular Spectrum Analysis and Its Applications (2012)
Hennig, C., Meila, M., Murtagh, F., Rocci, R. (eds): Handbook of Cluster Analysis. Chapman and Hall/CRC, London (2015)
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)
Irpino, A., Verde, R. (2006) A new Wasserstein based distance for the hierarchical clustering of histogram symbolic data. In: Batagelj V, Bock HH, Ferligoj A (eds.) Proceedings of the Conference of the International Federation of Classification Societies (IFCS06), pp. 185–192. Springer, Heidelberg
Johnston, J., Dinardo, J.: Econometric Methods, 2nd edn. McGraw-Hill, New York (1997)
Le, N.D., Zidek, J.V.: Statistical Analysis of Environmental Space-Time Processes. Springer, New York (2006)
Le-Rademacher, J., Billard, L.: Symbolic covariance principal component analysis and visualization for interval-valued data. J. Comput. Gr. Stat. 21(2), 413–432 (2012)
LimaNeto, E., De Carvalho, F.A.T.: Centre and range method for fitting a linear regression model to symbolic interval data. Comput. Stat. Data Anal. 52(3), 1500–1515 (2008)
LimaNeto, E., De Carvalho, F.A.T.: Constrained linear regression models for symbolic interval-valued variables. Comput. Stat. Data Anal. 54(2), 333–347 (2010)
LimaNeto, E., De Carvalho, F.A.T.: Bivariate symbolic regression models for interval-valued variables. J. Stat. Comput. Simul. 81(11), 1727–1744 (2011)
Maia, A.L.S., De Carvalho, F.A.T., Ludermir, T.B.: Forecasting models for interval-valued time series. Neurocomputing 71(16), 3344–3352 (2008)
Percival, D., Walden, A.: Wavelets Analysis for Time Series Analysis. Cambridge University Press, Cambridge (2000)
Pfeifer, P., Deutsch, S.: A three stage interactive procedure for space-time modeling. Technometrics 22, 35–47 (1980)
Ramos-Guajardo, A.B., Grzegorzewski, P.: Distance-based linear discriminant analysis for interval-valued data. Inf. Sci. 372, 591–607 (2016)
Rodrigues, P.M., Salish, N.: Modeling and forecasting interval time series with threshold models. Adv. Data Anal. Classif. 9(1), 41–57 (2015)
Teles, P., Brito, P.: Modelling interval time series data. In: Proceedings of the 3rd IASC World Conference on Computational Statistics and Data Analysis, Limassol, Cyprus (2005)
Teles, P., Brito, P.: Modeling interval time series with space-time processes. Commun. Stat.Theory Methods 44(17), 3599–3627 (2015)
Verde, R., Irpino, A.: Dynamic clustering of histogram data: Using the right metric. In: Brito, P., Bertrand, P., Cucumel, G., De Carvalho, F.A.T. (eds.) Selected Contributions in Data Analysis and Classification, pp. 123–134. Springer, Heidelberg (2007)
Verde, R., Irpino, A.: Comparing histogram data using a Mahalanobis-Wasserstein distance. In: Brito, P. (ed) Proceedings of the COMPSTAT’2008, pp. 77–89. Springer, Heidelberg (2008)
Wei, W.W.S.: Time Series Analysis–Univariate and Multivariate Methods, 2nd edn. Pearson, New York (2006)
Acknowledgements
The work of P. Teles and P. Brito is financed by the ERDF—European Regional Development Fund—through the Operational Programme for Competitiveness and Internationalisation—COMPETE 2020 Programme within project “POCI-01-0145-FEDER-006961”—and by the National Funds through the FCT—Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology)–as part of project UID/EEA/50014/2013. We thank the associate editor and reviewers for their helpful comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Maharaj, E.A., Teles, P. & Brito, P. Clustering of interval time series. Stat Comput 29, 1011–1034 (2019). https://doi.org/10.1007/s11222-018-09851-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-018-09851-z