Skip to main content

Clustering Heterogeneous Data Streams with Uncertainty over Sliding Window

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 8216))

Abstract

Existing methods for clustering uncertain data streams over sliding windows do not treat the categorical attributes. However, uncertain mixed data are ubiquitous. This paper investigates the problem of clustering heterogeneous data streams pervaded by uncertainty over sliding windows, so-called SWHU-Clustering. A Heterogeneous Uncertain Temporal Cluster Feature (HUTCF) is introduced to monitor the distribution statistics of mixed data points. Based on this structure, Exponential Histogram of Heterogeneous Uncertain Cluster Feature (EHHUCF) is presented as a collection of HUTCF. This structure may help to handle the in-cluster evolution, and detects the temporal change of the cluster distribution. Our approach has several advantages over existing method: 1) the higher execution efficiency benefits from its good design as it avoids the effects of old data on the final results. 2) We incorporated the k-NN into the clustering process in order to reduce the complexity of the algorithm. 3) Memory consumption can be managed efficiently by limiting the number of HUTCF in each EHHUCF. Simulations on real databases show the feasibility of SWHU-Clustering as well as its effectiveness by comparing it with UMicro algorithm.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chen, J., DeWitt, D.J., Tian, F., Wang, Y.: Niagaracq: A scalable continuous query system for internet databases. In: Chen, W., Naughton, J.F., Bernstein, P.A. (eds.) Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, Texas, USA, May 16-18, pp. 379–390. ACM (2000)

    Google Scholar 

  2. Zhu, Y., Shasha, D.: Statstream: statistical monitoring of thousands of data streams in real time. In: Proceedings of the 28th International Conference on Very Large Data Bases, VLDB 2002, pp. 358–369. VLDB Endowment (2002)

    Google Scholar 

  3. Bonnet, P., Gehrke, J., Seshadri, P.: Towards sensor database systems. In: Tan, K.-L., Franklin, M.J., Lui, J.C.-S. (eds.) MDM 2001. LNCS, vol. 1987, pp. 3–14. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  4. Online, M.W.: Merriam-webster online dictionary (2009)

    Google Scholar 

  5. Considine, J., Li, F., Kollios, G., Byers, J.: Approximate aggregation techniques for sensor databases. In: Proceedings of the 20th International Conference on Data Engineering, ICDE 2004, pp. 449–460. IEEE Computer Society, Washington, DC (2004)

    Google Scholar 

  6. Zhang, C., Gao, M., Zhou, A.: Tracking high quality clusters over uncertain data streams. In: Proceedings of the 2009 IEEE International Conference on Data Engineering, pp. 1641–1648. IEEE Computer Society (2009)

    Google Scholar 

  7. Aggarwal, C.C., Yu, P.S.: A framework for clustering uncertain data streams. In: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE 2008, pp. 150–159. IEEE Computer Society (2008)

    Google Scholar 

  8. Guoyan, H., Dapeng, L., Jiadong, R., Changzhen, H.: An algorithm for clustering uncertain data streams over sliding windows. In: 2010 6th International Conference on Digital Content, Multimedia Technology and its Applications (IDC), pp. 173–177. IEEE Computer Society (2010)

    Google Scholar 

  9. Huang, G.Y., Liang, D.P., Hu, C.Z., Ren, J.D.: An algorithm for clustering heterogeneous data streams with uncertainty. In: Proceedings of the International Conference on Machine Learning and Cybernetics, ICMLC 2010, Qingdao, China, July 11-14, pp. 2059–2064. IEEE (2010)

    Google Scholar 

  10. Serir, L., Ramasso, E., Zerhouni, N.: Evidential evolving gustafson kessel algorithm for online data streams partitioning using belief function theory. Int. J. Approx. Reasoning 53, 747–768 (2012)

    Article  MathSciNet  Google Scholar 

  11. Dempster, A.P.: Upper and lower probabilities induced by a multivalued mapping. Annals of Mathematical Statistics 38, 325–339 (1967)

    Article  MATH  MathSciNet  Google Scholar 

  12. Zhou, A., Cao, F., Qian, W., Jin, C.: Tracking clusters in evolving data streams over sliding windows. Knowl. Inf. Syst. 15, 181–214 (2008)

    Article  Google Scholar 

  13. Huang, Z.: Clustering large data sets with mixed numeric and categorical values. In: The First Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 21–34 (1997)

    Google Scholar 

  14. Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases, vol. 29, pp. 81–92. VLDB Endowment (2003)

    Google Scholar 

  15. Liu, W., OuYang, J.: Clustering algorithm for high dimensional data stream over sliding windows. In: Proceedings of the 2011 IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications, pp. 1537–1542. IEEE Computer Society (2011)

    Google Scholar 

  16. Murphy, P., Aha, D.: Uci repository databases (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hentech, H., Gouider, M.S., Farhat, A. (2013). Clustering Heterogeneous Data Streams with Uncertainty over Sliding Window. In: Cuzzocrea, A., Maabout, S. (eds) Model and Data Engineering. MEDI 2013. Lecture Notes in Computer Science, vol 8216. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41366-7_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41366-7_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41365-0

  • Online ISBN: 978-3-642-41366-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics