A BIRCH-Based Clustering Method for Large Time Series Databases

Le Quy Nhon, Vo; Anh, Duong Tuan

doi:10.1007/978-3-642-28320-8_13

A BIRCH-Based Clustering Method for Large Time Series Databases

Vo Le Quy Nhon²³ &
Duong Tuan Anh²³

Conference paper

1522 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7104))

Abstract

This paper presents a novel approach for time series clustering which is based on BIRCH algorithm. Our BIRCH-based approach performs clustering of time series data with a multi-resolution transform used as feature extraction technique. Our approach hinges on the use of cluster feature (CF) tree that helps to resolve the dilemma associated with the choices of initial centers and significantly improves the execution time and clustering quality. Our BIRCH-based approach not only takes full advantages of BIRCH algorithm in the capacity of handling large databases but also can be viewed as a flexible clustering framework in which we can apply any selected clustering algorithm in Phase 3 of the framework. Experimental results show that our proposed approach performs better than k-Means in terms of clustering quality and running time, and better than I-k-Means in terms of clustering quality with nearly the same running time.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chan, K., Fu, W.: Efficient time series matching by wavelets. In: Proceedings of the 15th IEEE Intl. Conf. on Data Engineering (ICDE 1999), March 23-26, pp. 126–133 (1999)
Google Scholar
Gavrilov, M., Anguelov, M., Indyk, P., Motwani, R.: Mining The Stock Market: Which Measure is Best? In: Proc. of 6th ACM Conf. on Knowledge Discovery and Data Mining, Boston, MA, August 20-23, pp. 487–496 (2000)
Google Scholar
Halkdi, M., Batistakis, Y., Vizirgiannis, M.: On Clustering Validation Techniques. J. Intelligent Information Systems 17(2-3), 107–145 (2001)
Article Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann (2006)
Google Scholar
Kalpakis, K., Gada, D., Puttagunta, V.: Distance Measures for Effective Clustering of ARIMA Time Series. In: Proc. of 2001 IEEE Int. Conf. on Data Mining, pp. 273–280 (2001)
Google Scholar
Keogh, E., Folias, T.: The UCR Time Series Data Mining Archive (2002), http://www.cs.ucr.edu/~eamonn/TSDMA/index.html
Lin, J., Vlachos, M., Keogh, E.J., Gunopulos, D.: Iterative Incremental Clustering of Time Series. In: Hwang, J., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 106–122. Springer, Heidelberg (2004)
Chapter Google Scholar
Cao, L.: In-depth Behavior Understanding and Use: the Behavior Informatics Approach. Information Science 180(17), 3067–3085 (2010)
Article Google Scholar
May, P., Ehrlich, H.-C., Steinke, T.: ZIB Structure Prediction Pipeline: Composing a Complex Biological Workflow Through Web Services. In: Nagel, W.E., Walter, W.V., Lehner, W. (eds.) Euro-Par 2006. LNCS, vol. 4128, pp. 1148–1158. Springer, Heidelberg (2006)
Chapter Google Scholar
Redmond, S., Heneghan, C.: A Method for Initialization the k-Means Clustering Algorithm Using kd-Trees. Pattern Recognition Letters (2007)
Google Scholar
Strehl, A., Ghosh, J.: Cluster Ensembles – A Knowledge Reuse Framework for Combining Multiple Partitions. J. of Machine Learning Research 3(3), 583–617 (2002)
MathSciNet MATH Google Scholar
Zhang, H., Ho, T.B., Zhang, Y., Lin, M.S.: Unsupervised Feature Extraction for Time Series Clustering Using Orthogonal Wavelet Transform. Journal Informatica 30(3), 305–319 (2006)
MathSciNet MATH Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: A new data clustering algorithm and its applications. Journal of Data Mining and Knowledge Discovery 1(2), 141–182 (1997)
Article Google Scholar
Historical Data for S&P 500 Stocks, http://kumo.swcp.com/stocks/

Download references

Author information

Authors and Affiliations

Faculty of Computer Science and Engineering, Ho Chi Minh City University of Technology, Vietnam
Vo Le Quy Nhon & Duong Tuan Anh

Authors

Vo Le Quy Nhon
View author publications
You can also search for this author in PubMed Google Scholar
Duong Tuan Anh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Engineering and Information Technology, University of Technology Sydney, Broadway, PO Box 123, NSW 2007, Sydney, Australia
Longbing Cao
Shenzhen Institute of Advanced Technology (SIAT), Chinese Academy of Sciences, 518055, Shenzhen, China
Joshua Zhexue Huang & Jun Luo &
The University of Melbourne, VIC 3010, Melbourne, Australia
James Bailey
The University of Auckland, Auckland, New Zealand
Yun Sing Koh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Le Quy Nhon, V., Anh, D.T. (2012). A BIRCH-Based Clustering Method for Large Time Series Databases. In: Cao, L., Huang, J.Z., Bailey, J., Koh, Y.S., Luo, J. (eds) New Frontiers in Applied Data Mining. PAKDD 2011. Lecture Notes in Computer Science(), vol 7104. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28320-8_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-28320-8_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28319-2
Online ISBN: 978-3-642-28320-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics