Abstract
Clustering is a one of the most important tasks of data mining. Algorithms like the Fuzzy C-Means and Possibilistic C-Means provide good result both for the static data and data streams. All clustering algorithms compute centers from chunk of data, what requires a lot of time. If the rate of incoming data is faster than speed of algorithm, part of data will be lost. To prevent such situation, some pre-processing algorithms should be used. The purpose of this paper is to propose a pre-processing method for clustering algorithms. Experimental results show that proposed method is appropriate to handle noisy data and can accelerate processing time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of 29th International Conference on Very Large Data Bases, pp. 81–92 (2003)
Birant, D., Kut, A.: ST-DBSCAN: An algorithm for clustering spatial-temporal data. In: Data and Knowledge Engineering, pp. 208–221 (2007)
Domingos, P., Hulton, G.: A general method for scaling up machine learning algorithms and its application to clustering. In: Proceedings of the 18th International Conference on Machine Learning, pp. 106–113 (2001)
Babcook, B., Badu, S., Datar, M., Motwani, R., Widom, J.: Models and Issues in Data Stream Systems. In: International Conference on Management of Data and Symposium on Principles Database and Systems Madison (2002)
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algoritms. Plenum Press, New York (1981)
Golab, L., Ozsu, T.: Issue in Data Stream Management. SIGMOD Record 32(2) (2003)
Guha, S., Koudas, N., Shim, K.: Data-Streams and Histograms. STOC 2001, Hersonissos, Crete, Greece (2001)
Hoppner, F., Klawonn, F., Kruse, R., Runkler, T.: Fuzzy Cluster Analysis. In: Method For Classification, Data Analysis and Image Recognition. John Wiley & Sons, Ltd. (1999)
Jagadish, H.V., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K., Suel, T.: Optimal Histograms with Quality Guarantees. In: Proceeding of the 24th VLBD Conference, New York, USA (1998)
Krishnapuram, R., Keller, J.: A possibilistic approach to clustering. IEEE Trans. Fuzzy Syst. 1(2), 98–110 (1993)
Tu, L., Chen, Y.: Stream Data Clustering based on Grid Density and Attraction. ACM Transactions on Computional Logic 1, 1–26 (2008)
MacQueen, J.B.: Some Methods for classification and Analysis of Multivariate Observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)
Miyamoto, S., Ichihashi, H., Honda, K.: Algorithms for Fuzzy Clustering. Methods in C-Means Clustering with Applications. Springer (2008)
Park, N.H., Lee, W.S.: Statistical Grid-based Clustering over Data Streams. SIGMOD Record 33(1) (2004)
Nowicki, R.: Nonlinear modelling and classification based on the MICOG defuzzifications. Journal of Nonlinear Analysis, Series A: Theory, Methods and Applications 7(12), 1033–1047 (2009)
O’ Chalaghan, L., Mishra, N., Meyerson, A., Guha, S., Motwani, R.: Streaming data algorithms for high quality clustering. In: Proceedings of the 18th International Conference on Data Engineering, pp. 685–694 (2002)
Poosala, V., Ioannidis, Y.E., Haas, P.J., Shekita, E.J.: Improved Histograms for Selectivity Estimation of Range Predicates. In: Proc. of ACM SIGMOD Conf., pp. 294–305 (1996)
Rutkowski, L.: The real-time identification of time-varying systems by nonparametric algorithms based on the Parzen kernels. International Journal of Systems Science 16, 1123–1130 (1985)
Rutkowski, L.: Sequential pattern recognition procedures derived from multiple Fourier series. Pattern Recognition Letters 8, 213–216 (1988)
Rutkowski, L.: An application of multiple Fourier series to identification of multivariable nonstationary systems. International Journal of Systems Science 20(10), 1993–2002 (1989)
Rutkowski, L.: Nonparametric learning algorithms in the time-varying environments. Signal Processing 18, 129–137 (1989)
Rutkowski, L.: Computational Intelligence: Methods and Techniques. Springer (2008)
Rutkowski, L., Cpałka, K.: A general approach to neuro - fuzzy systems. In: Proceedings of the 10th IEEE International Conference on Fuzzy Systems, Melbourne, December 2-5, vol. 3, pp. 1428–1431 (2001)
Rutkowski, L., Cpałka, K.: A neuro-fuzzy controller with a compromise fuzzy reasoning. Control and Cybernetics 31(2), 297–308 (2002)
Scherer, R.: Boosting Ensemble of Relational Neuro-fuzzy Systems. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Żurada, J.M. (eds.) ICAISC 2006. LNCS (LNAI), vol. 4029, pp. 306–313. Springer, Heidelberg (2006)
Scherer, R.: Neuro-fuzzy Systems with Relation Matrix. In: Rutkowski, L., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2010. LNCS (LNAI), vol. 6113, pp. 210–215. Springer, Heidelberg (2010)
Starczewski, J., Rutkowski, L.: Interval type 2 neuro-fuzzy systems based on interval consequents. In: Rutkowski, L., Kacprzyk, J. (eds.) Neural Networks and Soft Computing, pp. 570–577. Physica-Verlag, Springer-Verlag Company, Heidelberg, New York (2003)
Starczewski, J.T., Rutkowski, L.: Connectionist Structures of Type 2 Fuzzy Inference Systems. In: Wyrzykowski, R., Dongarra, J., Paprzycki, M., Waśniewski, J. (eds.) PPAM 2001. LNCS, vol. 2328, pp. 634–642. Springer, Heidelberg (2002)
Vivekanandan, P., Nedunchezhian, R.: Mining Rules of Concept Drift Using Genetic Algorithm. Journal of Artificial Inteligence and Soft Computing Research 1(2), 135–145 (2011)
Wan, R., Yan, X., Su, X.: A Weighted Fuzzy Clustering Algorithm for Data Stream. In: ISECS International Colloquium on Computing, Communication, Control, and Management (2008)
Zhang, D., Gunopulos, D., Tsotras, V.J., Seeger, B.: Temporal and spatio-temporal aggregations over data streams using multiple time granularities. Journal Information Systems 28(1-2) (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Duda, P., Jaworski, M., Pietruczuk, L. (2012). On Pre-processing Algorithms for Data Stream. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2012. Lecture Notes in Computer Science(), vol 7268. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29350-4_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-29350-4_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29349-8
Online ISBN: 978-3-642-29350-4
eBook Packages: Computer ScienceComputer Science (R0)