Skip to main content

On Pre-processing Algorithms for Data Stream

  • Conference paper
Artificial Intelligence and Soft Computing (ICAISC 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7268))

Included in the following conference series:

Abstract

Clustering is a one of the most important tasks of data mining. Algorithms like the Fuzzy C-Means and Possibilistic C-Means provide good result both for the static data and data streams. All clustering algorithms compute centers from chunk of data, what requires a lot of time. If the rate of incoming data is faster than speed of algorithm, part of data will be lost. To prevent such situation, some pre-processing algorithms should be used. The purpose of this paper is to propose a pre-processing method for clustering algorithms. Experimental results show that proposed method is appropriate to handle noisy data and can accelerate processing time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of 29th International Conference on Very Large Data Bases, pp. 81–92 (2003)

    Google Scholar 

  2. Birant, D., Kut, A.: ST-DBSCAN: An algorithm for clustering spatial-temporal data. In: Data and Knowledge Engineering, pp. 208–221 (2007)

    Google Scholar 

  3. Domingos, P., Hulton, G.: A general method for scaling up machine learning algorithms and its application to clustering. In: Proceedings of the 18th International Conference on Machine Learning, pp. 106–113 (2001)

    Google Scholar 

  4. Babcook, B., Badu, S., Datar, M., Motwani, R., Widom, J.: Models and Issues in Data Stream Systems. In: International Conference on Management of Data and Symposium on Principles Database and Systems Madison (2002)

    Google Scholar 

  5. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algoritms. Plenum Press, New York (1981)

    Google Scholar 

  6. Golab, L., Ozsu, T.: Issue in Data Stream Management. SIGMOD Record 32(2) (2003)

    Google Scholar 

  7. Guha, S., Koudas, N., Shim, K.: Data-Streams and Histograms. STOC 2001, Hersonissos, Crete, Greece (2001)

    Google Scholar 

  8. Hoppner, F., Klawonn, F., Kruse, R., Runkler, T.: Fuzzy Cluster Analysis. In: Method For Classification, Data Analysis and Image Recognition. John Wiley & Sons, Ltd. (1999)

    Google Scholar 

  9. Jagadish, H.V., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K., Suel, T.: Optimal Histograms with Quality Guarantees. In: Proceeding of the 24th VLBD Conference, New York, USA (1998)

    Google Scholar 

  10. Krishnapuram, R., Keller, J.: A possibilistic approach to clustering. IEEE Trans. Fuzzy Syst. 1(2), 98–110 (1993)

    Article  Google Scholar 

  11. Tu, L., Chen, Y.: Stream Data Clustering based on Grid Density and Attraction. ACM Transactions on Computional Logic 1, 1–26 (2008)

    Google Scholar 

  12. MacQueen, J.B.: Some Methods for classification and Analysis of Multivariate Observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)

    Google Scholar 

  13. Miyamoto, S., Ichihashi, H., Honda, K.: Algorithms for Fuzzy Clustering. Methods in C-Means Clustering with Applications. Springer (2008)

    Google Scholar 

  14. Park, N.H., Lee, W.S.: Statistical Grid-based Clustering over Data Streams. SIGMOD Record 33(1) (2004)

    Google Scholar 

  15. Nowicki, R.: Nonlinear modelling and classification based on the MICOG defuzzifications. Journal of Nonlinear Analysis, Series A: Theory, Methods and Applications 7(12), 1033–1047 (2009)

    Article  Google Scholar 

  16. O’ Chalaghan, L., Mishra, N., Meyerson, A., Guha, S., Motwani, R.: Streaming data algorithms for high quality clustering. In: Proceedings of the 18th International Conference on Data Engineering, pp. 685–694 (2002)

    Google Scholar 

  17. Poosala, V., Ioannidis, Y.E., Haas, P.J., Shekita, E.J.: Improved Histograms for Selectivity Estimation of Range Predicates. In: Proc. of ACM SIGMOD Conf., pp. 294–305 (1996)

    Google Scholar 

  18. Rutkowski, L.: The real-time identification of time-varying systems by nonparametric algorithms based on the Parzen kernels. International Journal of Systems Science 16, 1123–1130 (1985)

    Article  MATH  Google Scholar 

  19. Rutkowski, L.: Sequential pattern recognition procedures derived from multiple Fourier series. Pattern Recognition Letters 8, 213–216 (1988)

    Article  MATH  Google Scholar 

  20. Rutkowski, L.: An application of multiple Fourier series to identification of multivariable nonstationary systems. International Journal of Systems Science 20(10), 1993–2002 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  21. Rutkowski, L.: Nonparametric learning algorithms in the time-varying environments. Signal Processing 18, 129–137 (1989)

    Article  MathSciNet  Google Scholar 

  22. Rutkowski, L.: Computational Intelligence: Methods and Techniques. Springer (2008)

    Google Scholar 

  23. Rutkowski, L., Cpałka, K.: A general approach to neuro - fuzzy systems. In: Proceedings of the 10th IEEE International Conference on Fuzzy Systems, Melbourne, December 2-5, vol. 3, pp. 1428–1431 (2001)

    Google Scholar 

  24. Rutkowski, L., Cpałka, K.: A neuro-fuzzy controller with a compromise fuzzy reasoning. Control and Cybernetics 31(2), 297–308 (2002)

    MATH  Google Scholar 

  25. Scherer, R.: Boosting Ensemble of Relational Neuro-fuzzy Systems. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Żurada, J.M. (eds.) ICAISC 2006. LNCS (LNAI), vol. 4029, pp. 306–313. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  26. Scherer, R.: Neuro-fuzzy Systems with Relation Matrix. In: Rutkowski, L., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2010. LNCS (LNAI), vol. 6113, pp. 210–215. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  27. Starczewski, J., Rutkowski, L.: Interval type 2 neuro-fuzzy systems based on interval consequents. In: Rutkowski, L., Kacprzyk, J. (eds.) Neural Networks and Soft Computing, pp. 570–577. Physica-Verlag, Springer-Verlag Company, Heidelberg, New York (2003)

    Google Scholar 

  28. Starczewski, J.T., Rutkowski, L.: Connectionist Structures of Type 2 Fuzzy Inference Systems. In: Wyrzykowski, R., Dongarra, J., Paprzycki, M., Waśniewski, J. (eds.) PPAM 2001. LNCS, vol. 2328, pp. 634–642. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  29. Vivekanandan, P., Nedunchezhian, R.: Mining Rules of Concept Drift Using Genetic Algorithm. Journal of Artificial Inteligence and Soft Computing Research 1(2), 135–145 (2011)

    Google Scholar 

  30. Wan, R., Yan, X., Su, X.: A Weighted Fuzzy Clustering Algorithm for Data Stream. In: ISECS International Colloquium on Computing, Communication, Control, and Management (2008)

    Google Scholar 

  31. Zhang, D., Gunopulos, D., Tsotras, V.J., Seeger, B.: Temporal and spatio-temporal aggregations over data streams using multiple time granularities. Journal Information Systems 28(1-2) (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Duda, P., Jaworski, M., Pietruczuk, L. (2012). On Pre-processing Algorithms for Data Stream. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2012. Lecture Notes in Computer Science(), vol 7268. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29350-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29350-4_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29349-8

  • Online ISBN: 978-3-642-29350-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics