Abstract
This paper presents a novel approach for time series clustering which is based on BIRCH algorithm. Our BIRCH-based approach performs clustering of time series data with a multi-resolution transform used as feature extraction technique. Our approach hinges on the use of cluster feature (CF) tree that helps to resolve the dilemma associated with the choices of initial centers and significantly improves the execution time and clustering quality. Our BIRCH-based approach not only takes full advantages of BIRCH algorithm in the capacity of handling large databases but also can be viewed as a flexible clustering framework in which we can apply any selected clustering algorithm in Phase 3 of the framework. Experimental results show that our proposed approach performs better than k-Means in terms of clustering quality and running time, and better than I-k-Means in terms of clustering quality with nearly the same running time.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Chan, K., Fu, W.: Efficient time series matching by wavelets. In: Proceedings of the 15th IEEE Intl. Conf. on Data Engineering (ICDE 1999), March 23-26, pp. 126–133 (1999)
Gavrilov, M., Anguelov, M., Indyk, P., Motwani, R.: Mining The Stock Market: Which Measure is Best? In: Proc. of 6th ACM Conf. on Knowledge Discovery and Data Mining, Boston, MA, August 20-23, pp. 487–496 (2000)
Halkdi, M., Batistakis, Y., Vizirgiannis, M.: On Clustering Validation Techniques. J. Intelligent Information Systems 17(2-3), 107–145 (2001)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann (2006)
Kalpakis, K., Gada, D., Puttagunta, V.: Distance Measures for Effective Clustering of ARIMA Time Series. In: Proc. of 2001 IEEE Int. Conf. on Data Mining, pp. 273–280 (2001)
Keogh, E., Folias, T.: The UCR Time Series Data Mining Archive (2002), http://www.cs.ucr.edu/~eamonn/TSDMA/index.html
Lin, J., Vlachos, M., Keogh, E.J., Gunopulos, D.: Iterative Incremental Clustering of Time Series. In: Hwang, J., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 106–122. Springer, Heidelberg (2004)
Cao, L.: In-depth Behavior Understanding and Use: the Behavior Informatics Approach. Information Science 180(17), 3067–3085 (2010)
May, P., Ehrlich, H.-C., Steinke, T.: ZIB Structure Prediction Pipeline: Composing a Complex Biological Workflow Through Web Services. In: Nagel, W.E., Walter, W.V., Lehner, W. (eds.) Euro-Par 2006. LNCS, vol. 4128, pp. 1148–1158. Springer, Heidelberg (2006)
Redmond, S., Heneghan, C.: A Method for Initialization the k-Means Clustering Algorithm Using kd-Trees. Pattern Recognition Letters (2007)
Strehl, A., Ghosh, J.: Cluster Ensembles – A Knowledge Reuse Framework for Combining Multiple Partitions. J. of Machine Learning Research 3(3), 583–617 (2002)
Zhang, H., Ho, T.B., Zhang, Y., Lin, M.S.: Unsupervised Feature Extraction for Time Series Clustering Using Orthogonal Wavelet Transform. Journal Informatica 30(3), 305–319 (2006)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: A new data clustering algorithm and its applications. Journal of Data Mining and Knowledge Discovery 1(2), 141–182 (1997)
Historical Data for S&P 500 Stocks, http://kumo.swcp.com/stocks/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Le Quy Nhon, V., Anh, D.T. (2012). A BIRCH-Based Clustering Method for Large Time Series Databases. In: Cao, L., Huang, J.Z., Bailey, J., Koh, Y.S., Luo, J. (eds) New Frontiers in Applied Data Mining. PAKDD 2011. Lecture Notes in Computer Science(), vol 7104. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28320-8_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-28320-8_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28319-2
Online ISBN: 978-3-642-28320-8
eBook Packages: Computer ScienceComputer Science (R0)