Abstract
Time series clustering has attracted amount of attention recently. However, clustering massive time series faces the challenge of the huge computation cost. To reduce the computation cost, we propose a novel Dimension Hierarchical Decomposition (DHD for short) method to represent time series and a corresponding tree structure, denoted as DHDTree, to reorganize the time series collections to achieve the best separation effect. The main idea of DHDTree is to adapt k-d tree for time series by utilizing the DHD representation. When splitting, we select the most separable splitting strategy according to a predefined cost model. A fundamental feature of DHDTree is that it overcomes dimension curse by leveraging dimension compositions instead of selecting only one dimension when splitting, aiming to acquire the maximal separation effect. We illustrate that DHDTree obtains both the balance and the locality properties, which are important factors for the efficiency of time series organization for clustering. By the support of DHDTree, we improve clustering in two aspects. First, the DHD representation decreases the computation cost between time series dramatically. Secondly, we acquire the centers benefiting from the reorganization of the time series using our proposed DHDTree structure. Both the synthetic and real data sets verify the effectiveness and efficiency of the proposed method.
The work was supported by the Ministry of Science and Technology of China, National Key Research and Development Program under No.2016YFB1000700, National Key Basic Research Program of China under No.2015CB358800 and NSFC (61672163, U1509213).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
AlSabti, K., Ranka, S., Singh, V.: An efficient space-partitioning based algorithm for the K-means clustering. In: Zhong, N., Zhou, L. (eds.) PAKDD 1999. LNCS (LNAI), vol. 1574, pp. 355–360. Springer, Heidelberg (1999). doi:10.1007/3-540-48912-6_47
Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, New Orleans, Louisiana, USA, January 7–9, pp. 1027–1035 (2007)
Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable k-means++. PVLDB 5(7), 622–633 (2012)
Bentley, J.L.: K-d trees for semidynamic point sets, pp. 187–197 (1990)
Yingyi, B., Howe, B., Balazinska, M., Ernst, M.: Haloop: efficient iterative data processing on large clusters. PVLDB 3(1), 285–296 (2010)
Cordeiro, R.L.F., Traina Jr. C., Traina, A.J.M., López, J., Kang, U., Faloutsos, C.: Clustering very large multi-dimensional datasets with mapreduce. In: KDD, pp. 690–698 (2011)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: OSDI (2004)
Dhillon, I.S., Modha, D.S.: A data-clustering algorithm on distributed memory multiprocessors. In: Large-Scale Parallel Data Mining, pp. 245–260 (1999)
Ding, R., Wang, Q., Dang, Y., Qiang, F., Zhang, H., Zhang, D.: YADING: fast clustering of large-scale time series data. PVLDB 8(5), 473–484 (2015)
Ene, A., Im, S., Moseley, B.: Fast clustering using mapreduce. In: KDD, pp. 681–689 (2011)
Huang, J.Z., Ng, M.K., Rong, H., Li, Z.: Automated variable weighting in k-means type clustering. IEEE Trans. Pattern Anal. Mach. Intell. 27(5), 657–668 (2005)
Koperski, K., Han, J.: Discovery of spatial association rules in geographic information databases. In: Egenhofer, M.J., Herring, J.R. (eds.) SSD 1995. LNCS, vol. 951, pp. 47–66. Springer, Heidelberg (1995). doi:10.1007/3-540-60159-7_4
Kriegel, H.-P., Kröger, P., Zimek, A.: Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. TKDD 3(1), 1–58 (2009)
Kuo, H.-C., Lee, T.-L., Huang, J.-P.: Cluster analysis on time series gene expression data. IJBIDM 5(1), 56–76 (2010)
Lin, J., Vlachos, M., Keogh, E., Gunopulos, D.: Iterative incremental clustering of time series. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 106–122. Springer, Heidelberg (2004). doi:10.1007/978-3-540-24741-8_8
Mondal, A., Lifu, Y., Kitsuregawa, M.: P2PR-tree: an R-tree-based spatial index for peer-to-peer environments. In: EDBT Workshops, pp. 516–525 (2004)
Ordonez, C., Omiecinski, E.: Efficient disk-based k-means clustering for relational databases. IEEE Trans. Knowl. Data Eng. 16(8), 909–921 (2004)
Pelleg, D., Moore, A.W.: X-means: Extending k-means with efficient estimation of the number of clusters. In: ICML, pp. 727–734 (2000)
Rakthanmanon, T., Keogh, E.J., Lonardi, S., Evans, S.: Mdl-based time series clustering. Knowl. Inf. Syst. 33(2), 371–399 (2012)
Ulanova, L., Begum, N., Keogh, E.J.: Scalable clustering of time series with u-shapelets. In: Proceedings of the SIAM International Conference on Data Mining, Vancouver, BC, Canada, April 30–May 2, pp. 900–908 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Li, Q. et al. (2017). Clustering Time Series Utilizing a Dimension Hierarchical Decomposition Approach. In: Candan, S., Chen, L., Pedersen, T., Chang, L., Hua, W. (eds) Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science(), vol 10177. Springer, Cham. https://doi.org/10.1007/978-3-319-55753-3_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-55753-3_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55752-6
Online ISBN: 978-3-319-55753-3
eBook Packages: Computer ScienceComputer Science (R0)