Skip to main content

Clustering Time Series Utilizing a Dimension Hierarchical Decomposition Approach

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10177))

Included in the following conference series:

  • 2915 Accesses

Abstract

Time series clustering has attracted amount of attention recently. However, clustering massive time series faces the challenge of the huge computation cost. To reduce the computation cost, we propose a novel Dimension Hierarchical Decomposition (DHD for short) method to represent time series and a corresponding tree structure, denoted as DHDTree, to reorganize the time series collections to achieve the best separation effect. The main idea of DHDTree is to adapt k-d tree for time series by utilizing the DHD representation. When splitting, we select the most separable splitting strategy according to a predefined cost model. A fundamental feature of DHDTree is that it overcomes dimension curse by leveraging dimension compositions instead of selecting only one dimension when splitting, aiming to acquire the maximal separation effect. We illustrate that DHDTree obtains both the balance and the locality properties, which are important factors for the efficiency of time series organization for clustering. By the support of DHDTree, we improve clustering in two aspects. First, the DHD representation decreases the computation cost between time series dramatically. Secondly, we acquire the centers benefiting from the reorganization of the time series using our proposed DHDTree structure. Both the synthetic and real data sets verify the effectiveness and efficiency of the proposed method.

The work was supported by the Ministry of Science and Technology of China, National Key Research and Development Program under No.2016YFB1000700, National Key Basic Research Program of China under No.2015CB358800 and NSFC (61672163, U1509213).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. http://hadoop.apache.org/

  2. http://www.cs.ucr.edu/~eamonn/time_series_data/

  3. http://www.pmel.noaa.gov/tao/data_deliv/

  4. AlSabti, K., Ranka, S., Singh, V.: An efficient space-partitioning based algorithm for the K-means clustering. In: Zhong, N., Zhou, L. (eds.) PAKDD 1999. LNCS (LNAI), vol. 1574, pp. 355–360. Springer, Heidelberg (1999). doi:10.1007/3-540-48912-6_47

    Chapter  Google Scholar 

  5. Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, New Orleans, Louisiana, USA, January 7–9, pp. 1027–1035 (2007)

    Google Scholar 

  6. Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable k-means++. PVLDB 5(7), 622–633 (2012)

    Google Scholar 

  7. Bentley, J.L.: K-d trees for semidynamic point sets, pp. 187–197 (1990)

    Google Scholar 

  8. Yingyi, B., Howe, B., Balazinska, M., Ernst, M.: Haloop: efficient iterative data processing on large clusters. PVLDB 3(1), 285–296 (2010)

    Google Scholar 

  9. Cordeiro, R.L.F., Traina Jr. C., Traina, A.J.M., López, J., Kang, U., Faloutsos, C.: Clustering very large multi-dimensional datasets with mapreduce. In: KDD, pp. 690–698 (2011)

    Google Scholar 

  10. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: OSDI (2004)

    Google Scholar 

  11. Dhillon, I.S., Modha, D.S.: A data-clustering algorithm on distributed memory multiprocessors. In: Large-Scale Parallel Data Mining, pp. 245–260 (1999)

    Google Scholar 

  12. Ding, R., Wang, Q., Dang, Y., Qiang, F., Zhang, H., Zhang, D.: YADING: fast clustering of large-scale time series data. PVLDB 8(5), 473–484 (2015)

    Google Scholar 

  13. Ene, A., Im, S., Moseley, B.: Fast clustering using mapreduce. In: KDD, pp. 681–689 (2011)

    Google Scholar 

  14. Huang, J.Z., Ng, M.K., Rong, H., Li, Z.: Automated variable weighting in k-means type clustering. IEEE Trans. Pattern Anal. Mach. Intell. 27(5), 657–668 (2005)

    Article  Google Scholar 

  15. Koperski, K., Han, J.: Discovery of spatial association rules in geographic information databases. In: Egenhofer, M.J., Herring, J.R. (eds.) SSD 1995. LNCS, vol. 951, pp. 47–66. Springer, Heidelberg (1995). doi:10.1007/3-540-60159-7_4

    Chapter  Google Scholar 

  16. Kriegel, H.-P., Kröger, P., Zimek, A.: Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. TKDD 3(1), 1–58 (2009)

    Article  Google Scholar 

  17. Kuo, H.-C., Lee, T.-L., Huang, J.-P.: Cluster analysis on time series gene expression data. IJBIDM 5(1), 56–76 (2010)

    Article  Google Scholar 

  18. Lin, J., Vlachos, M., Keogh, E., Gunopulos, D.: Iterative incremental clustering of time series. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 106–122. Springer, Heidelberg (2004). doi:10.1007/978-3-540-24741-8_8

    Chapter  Google Scholar 

  19. Mondal, A., Lifu, Y., Kitsuregawa, M.: P2PR-tree: an R-tree-based spatial index for peer-to-peer environments. In: EDBT Workshops, pp. 516–525 (2004)

    Google Scholar 

  20. Ordonez, C., Omiecinski, E.: Efficient disk-based k-means clustering for relational databases. IEEE Trans. Knowl. Data Eng. 16(8), 909–921 (2004)

    Article  Google Scholar 

  21. Pelleg, D., Moore, A.W.: X-means: Extending k-means with efficient estimation of the number of clusters. In: ICML, pp. 727–734 (2000)

    Google Scholar 

  22. Rakthanmanon, T., Keogh, E.J., Lonardi, S., Evans, S.: Mdl-based time series clustering. Knowl. Inf. Syst. 33(2), 371–399 (2012)

    Article  Google Scholar 

  23. Ulanova, L., Begum, N., Keogh, E.J.: Scalable clustering of time series with u-shapelets. In: Proceedings of the SIAM International Conference on Data Mining, Vancouver, BC, Canada, April 30–May 2, pp. 900–908 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peng Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Li, Q. et al. (2017). Clustering Time Series Utilizing a Dimension Hierarchical Decomposition Approach. In: Candan, S., Chen, L., Pedersen, T., Chang, L., Hua, W. (eds) Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science(), vol 10177. Springer, Cham. https://doi.org/10.1007/978-3-319-55753-3_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-55753-3_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-55752-6

  • Online ISBN: 978-3-319-55753-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics