Clustering Time Series Utilizing a Dimension Hierarchical Decomposition Approach

Li, Qiuhong; Wang, Peng; Wang, Yang; Wang, Wei; Liu, Yimin; Wu, Jiaye; Dou, Danyang

doi:10.1007/978-3-319-55753-3_16

Qiuhong Li¹⁸,
Peng Wang¹⁸,
Yang Wang¹⁸,
Wei Wang¹⁸,
Yimin Liu¹⁹,
Jiaye Wu¹⁸ &
…
Danyang Dou¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10177))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

2915 Accesses

Abstract

Time series clustering has attracted amount of attention recently. However, clustering massive time series faces the challenge of the huge computation cost. To reduce the computation cost, we propose a novel Dimension Hierarchical Decomposition (DHD for short) method to represent time series and a corresponding tree structure, denoted as DHDTree, to reorganize the time series collections to achieve the best separation effect. The main idea of DHDTree is to adapt k-d tree for time series by utilizing the DHD representation. When splitting, we select the most separable splitting strategy according to a predefined cost model. A fundamental feature of DHDTree is that it overcomes dimension curse by leveraging dimension compositions instead of selecting only one dimension when splitting, aiming to acquire the maximal separation effect. We illustrate that DHDTree obtains both the balance and the locality properties, which are important factors for the efficiency of time series organization for clustering. By the support of DHDTree, we improve clustering in two aspects. First, the DHD representation decreases the computation cost between time series dramatically. Secondly, we acquire the centers benefiting from the reorganization of the time series using our proposed DHDTree structure. Both the synthetic and real data sets verify the effectiveness and efficiency of the proposed method.

The work was supported by the Ministry of Science and Technology of China, National Key Research and Development Program under No.2016YFB1000700, National Key Basic Research Program of China under No.2015CB358800 and NSFC (61672163, U1509213).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Time series clustering in linear time complexity

Article 18 September 2021

An Optimal Wavelet Detailed-Coefficient Determination Using Time-Series Clustering

Detecting patterns in financial data through weighted time-frequency domain clustering

Article 20 November 2024

References

http://hadoop.apache.org/
http://www.cs.ucr.edu/~eamonn/time_series_data/
http://www.pmel.noaa.gov/tao/data_deliv/
AlSabti, K., Ranka, S., Singh, V.: An efficient space-partitioning based algorithm for the K-means clustering. In: Zhong, N., Zhou, L. (eds.) PAKDD 1999. LNCS (LNAI), vol. 1574, pp. 355–360. Springer, Heidelberg (1999). doi:10.1007/3-540-48912-6_47
Chapter Google Scholar
Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, New Orleans, Louisiana, USA, January 7–9, pp. 1027–1035 (2007)
Google Scholar
Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable k-means++. PVLDB 5(7), 622–633 (2012)
Google Scholar
Bentley, J.L.: K-d trees for semidynamic point sets, pp. 187–197 (1990)
Google Scholar
Yingyi, B., Howe, B., Balazinska, M., Ernst, M.: Haloop: efficient iterative data processing on large clusters. PVLDB 3(1), 285–296 (2010)
Google Scholar
Cordeiro, R.L.F., Traina Jr. C., Traina, A.J.M., López, J., Kang, U., Faloutsos, C.: Clustering very large multi-dimensional datasets with mapreduce. In: KDD, pp. 690–698 (2011)
Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: OSDI (2004)
Google Scholar
Dhillon, I.S., Modha, D.S.: A data-clustering algorithm on distributed memory multiprocessors. In: Large-Scale Parallel Data Mining, pp. 245–260 (1999)
Google Scholar
Ding, R., Wang, Q., Dang, Y., Qiang, F., Zhang, H., Zhang, D.: YADING: fast clustering of large-scale time series data. PVLDB 8(5), 473–484 (2015)
Google Scholar
Ene, A., Im, S., Moseley, B.: Fast clustering using mapreduce. In: KDD, pp. 681–689 (2011)
Google Scholar
Huang, J.Z., Ng, M.K., Rong, H., Li, Z.: Automated variable weighting in k-means type clustering. IEEE Trans. Pattern Anal. Mach. Intell. 27(5), 657–668 (2005)
Article Google Scholar
Koperski, K., Han, J.: Discovery of spatial association rules in geographic information databases. In: Egenhofer, M.J., Herring, J.R. (eds.) SSD 1995. LNCS, vol. 951, pp. 47–66. Springer, Heidelberg (1995). doi:10.1007/3-540-60159-7_4
Chapter Google Scholar
Kriegel, H.-P., Kröger, P., Zimek, A.: Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. TKDD 3(1), 1–58 (2009)
Article Google Scholar
Kuo, H.-C., Lee, T.-L., Huang, J.-P.: Cluster analysis on time series gene expression data. IJBIDM 5(1), 56–76 (2010)
Article Google Scholar
Lin, J., Vlachos, M., Keogh, E., Gunopulos, D.: Iterative incremental clustering of time series. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 106–122. Springer, Heidelberg (2004). doi:10.1007/978-3-540-24741-8_8
Chapter Google Scholar
Mondal, A., Lifu, Y., Kitsuregawa, M.: P2PR-tree: an R-tree-based spatial index for peer-to-peer environments. In: EDBT Workshops, pp. 516–525 (2004)
Google Scholar
Ordonez, C., Omiecinski, E.: Efficient disk-based k-means clustering for relational databases. IEEE Trans. Knowl. Data Eng. 16(8), 909–921 (2004)
Article Google Scholar
Pelleg, D., Moore, A.W.: X-means: Extending k-means with efficient estimation of the number of clusters. In: ICML, pp. 727–734 (2000)
Google Scholar
Rakthanmanon, T., Keogh, E.J., Lonardi, S., Evans, S.: Mdl-based time series clustering. Knowl. Inf. Syst. 33(2), 371–399 (2012)
Article Google Scholar
Ulanova, L., Begum, N., Keogh, E.J.: Scalable clustering of time series with u-shapelets. In: Proceedings of the SIAM International Conference on Data Mining, Vancouver, BC, Canada, April 30–May 2, pp. 900–908 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Fudan University, Shanghai, China
Qiuhong Li, Peng Wang, Yang Wang, Wei Wang, Jiaye Wu & Danyang Dou
Third Affiliated Hospital of Second Military Medical University, Shanghai, China
Yimin Liu

Authors

Qiuhong Li
View author publications
You can also search for this author in PubMed Google Scholar
Peng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yimin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jiaye Wu
View author publications
You can also search for this author in PubMed Google Scholar
Danyang Dou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peng Wang .

Editor information

Editors and Affiliations

Arizona State University , Tempe - Phoenix, Arizona, USA
Selçuk Candan
Hong Kong University of Science and Tech , Hong Kong, China
Lei Chen
Aalborg University , Aalborg, Denmark
Torben Bach Pedersen
University of New South Wales , Sydney, New South Wales, Australia
Lijun Chang
The University of Queensland , Brisbane, Queensland, Australia
Wen Hua

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Q. et al. (2017). Clustering Time Series Utilizing a Dimension Hierarchical Decomposition Approach. In: Candan, S., Chen, L., Pedersen, T., Chang, L., Hua, W. (eds) Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science(), vol 10177. Springer, Cham. https://doi.org/10.1007/978-3-319-55753-3_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-55753-3_16
Published: 22 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55752-6
Online ISBN: 978-3-319-55753-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics