Abstract
Uncertainty is inherent in data streams, and presents new challenges to data streams mining. For continuous arriving and huge size of data streams, it requires significantly more space to represent and cluster the uncertain time series data streams. Therefore, it is important to construct compressed representation for storing uncertain time series data. The granular sketches and buckets policy are designed through hash-compressed storage and micro clusters. Then based on the max-min cluster distance measure, an initial cluster centers selection algorithm is proposed to improve the quality of clustering uncertain data streams. Finally, the effectiveness of the proposed algorithm is illustrated through analyzing the experimental results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Chau, M., Cheng, R., Kao, B., Ng, J.: Uncertain data mining: An example in clustering location data. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 199–204. Springer, Heidelberg (2006)
Gaffney, S., Smyth, P.: Trajectory clustering with mixtures of regression models. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, pp. 63–72 (August 1999)
Xiong, Y., Yeung, D.: Mixtures of ARMA models for model-based time series clustering. In: Proceedings of the 2002 IEEE International Conference on Data Mining, Maebashi City, Japan, pp. 717–720 (December 2002)
Sathe, S., Jeung, H., Aberer, K.: Creating probabilistic databases from imprecise time-series data. In: Proceedings of the 2011 IEEE International Conference on Data Engineering (ICDE), pp. 327–338 (2011)
Ackermann, M.R., Lammersen, C., Martens, M., Raupach, C., Swierkot, K., Sohler, C.: StreamKM++: A Clustering Algorithm for Data Streams. Journal of Experimental Algorithmics (JEA) 17(1) (July 2012)
Tran, T.T.L., Peng, P., Li, B.D., Diao, Y., Liu, A.N.: PODS: a new model and processing algorithms for uncertain data streams. In: Proceedings of the 2010 International Conference on Management of Data, Indiana, USA, pp. 159–170 (2010)
Shao, F., Yu, Z.: Principle and Algorithm of Data Mining. Water conservancy & water electric press of China, Beijing (2003)
Li, Y., Han, J., Yang, J.: Clustering Moving Objects. In: Proc. of the 10th ACM SIGKDD Int’l. Conf. on Knowledge Discovery and Data Mining (2004)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: Proc. of the 1996 ACM SIGMOD Int’l. Conf. on Management of Data (1996)
Luhr, S., Lazarescu, M.: Incremental clustering on dynamic data streams using connectivity based representative points. Data & Knowledge Engineering, 1–27 (2009)
Alon, N., Matias, Y., Szegedy, M.: The Space Complexity of Approximating the Frequency Moments. In: ACM Symposium on Theory of Computing, pp. 20–29 (1996)
Cormode, G., Muthukrishnan, S.: An Improved Data-Stream Summary: The Count-min Sketch and its Applications. Journal of Algorithms 55(1), 58–75 (2005)
Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: Tracking most frequent items dynamically. In: Proceedings of the 22nd ACM Symposium on Principles of Database Systems, pp. 296–306 (2003)
Manerikar, N., Palpanas, T.: Frequent items in streaming data: An experimental evaluation of the state-of-the-art. Technical Report DISI-08-017, University of Trento (March 2008)
Aggarwal, C.: A Framework for Clustering Massive-Domain Data Streams. In: IEEE 25th International Conference on Data Engineering (ICDE 2009), pp. 102–113 (2009)
Liu, Y., Zhang, L., Guan, Y.: Sketch-based Streaming PCA Algorithm for Network-wide Traffic Anomaly Detection. In: Proc. ICDCS (2010)
Somasundaram, R.S., Nedunchezhian, R.: Evaluation of three Simple Imputation Methods for Enhancing Preprocessing of Data with Missing Values. International Journal of Computer Applications 21(10) (May 2011) 0975–8887
Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: Proceedings of the 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, pp. 149–160. ACM Press (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, J., Chen, P., Sheng, X. (2013). Granular Sketch Based Uncertain Time Series Streams Clustering. In: Yang, Y., Ma, M., Liu, B. (eds) Information Computing and Applications. ICICA 2013. Communications in Computer and Information Science, vol 391. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53932-9_53
Download citation
DOI: https://doi.org/10.1007/978-3-642-53932-9_53
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53931-2
Online ISBN: 978-3-642-53932-9
eBook Packages: Computer ScienceComputer Science (R0)