Skip to main content

DMM-Stream: A Density Mini-Micro Clustering Algorithm for Evolving Data Streams

  • Conference paper
  • First Online:
Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013)

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 285))

Abstract

Clustering real-time stream data is an important and challenging problem. The existing algorithms have not considered the distribution of data inside micro cluster, specifically when data points are non uniformly distributed inside micro cluster. In this situation, a large radius of micro cluster has to be considered which leads to lower quality. In this paper, we present a density-based clustering algorithm, DMM-Stream, for evolving data streams. It is an online-offline algorithm which considers the distribution of data inside micro cluster. In DMM-Stream, we introduce mini-micro cluster for keeping summary information of data points inside micro cluster. In our method, based on the distribution of the dense areas inside the micro cluster at least one representative point, either micro cluster itself or its mini-micro clusters’ centers, are sent to the offline phase. By choosing a proper mini-micro and micro center, we increase cluster quality while maintaining the time complexity. A pruning strategy is also used to filter out the real data from noise by introducing dense and sparse mini-micro and micro cluster. Our performance study over real and synthetic data sets demonstrates effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.C. (ed.): Data Streams – Models and Algorithms. Springer (2007)

    Google Scholar 

  2. Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the 29th international conference on Very large data bases. pp. 81–92. VLDB Endowment (2003)

    Google Scholar 

  3. Amini, A., Teh Ying, W.: Density micro-clustering algorithms on data streams: A review. In: International Conference on Data Mining and Applications (ICDMA). pp. 410–414. Hong Kong (2011)

    Google Scholar 

  4. Amini, A., Teh Ying, W.: A comparative study of density-based clustering algorithms on data streams: Micro-clustering approaches. In: Ao, S.I., Castillo, O., Huang, X. (eds.) Intelligent Control and Innovative Computing, Lecture Notes in Electrical Engineering, vol. 110, pp. 275–287. Springer US (2012)

    Google Scholar 

  5. Amini, A., Teh Ying, W.: DENGRIS-Stream: A density-grid based clustering algorithm for evolving data streams over sliding window. In: International Conference on Data Mining and Computer Engineering (ICDMCE). pp. 206–210. Bangkok, Thailand (2012)

    Google Scholar 

  6. Amini, A., Teh Ying, W.: Requirements for clustering evolving data stream. In: 2nd International Conference on Power Electronics, Computer and Mechanical Engineering (ICPECME). Cambodia (2013)

    Google Scholar 

  7. Amini, A., Teh Ying, W., Saybani, M.R., Aghabozorgi, S.R.: A study of density-grid based clustering algorithms on data streams. In: 8th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD11). pp. 1652–1656. IEEE, Shanghai (2011)

    Google Scholar 

  8. Amini, A., Wah, T.Y.: Adaptive density-based clustering algorithms for data stream mining. In: Third International Conference on Theoretical and Mathematical Foundations of Computer Science. pp. 620–624. IERI (2012)

    Google Scholar 

  9. Bifet, A., Holmes, G., Pfahringer, B., Kranen, P., Kremer, H., Jansen, T., Seidl, T.: Moa: Massive online analysis, a framework for stream classification and clustering. In: Journal of Machine Learning Research (JMLR). vol. 11, pp. 44–50 (2010)

    Google Scholar 

  10. Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: SIAM Conference on Data Mining. pp. 328–339 (2006)

    Google Scholar 

  11. Chen, Y., Tu, L.: Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 133–142. KDD’07, ACM, New York, NY, USA (2007)

    Google Scholar 

  12. Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams: Theory and practice. IEEE Transactions on Knowledge and Data Engineering 15(3), 515–528 (June 2003)

    Google Scholar 

  13. Guha, S., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams. In: Proceedings of the 41st Annual Symposium on Foundations of Computer Science. p. 359. IEEE Computer Society, Washington, DC, USA (2000)

    Google Scholar 

  14. Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques Third edition. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2011)

    Google Scholar 

  15. Kranen, P., Assent, I., Baldauf, C., Seidl, T.: The clustree: indexing micro-clusters for anytime stream mining. Knowl. Inf. Syst. 29(2), 249–272 (2011)

    Google Scholar 

  16. Ng, W., Dash, M.: Discovery of frequent patterns in transactional data streams. In: Transactions on Large-Scale Data- and Knowledge-Centered Systems II, Lecture Notes in Computer Science, vol. 6380, pp. 1–30. Springer Berlin/Heidelberg (2010)

    Google Scholar 

  17. O′Callaghan, L., Meyerson, A., Motwani, R., Mishra, N., Guha, S.: Streaming- data algorithms for high-quality clustering. In: International Conference on Data Engineering. pp. 685–694. IEEE Computer Society, Los Alamitos, CA, USA (2002)

    Google Scholar 

  18. Tu, L., Chen, Y.: Stream data clustering based on grid density and attraction. ACM Transactions on Knowledge Discovery Data 3(3), 1–27 (2009)

    Google Scholar 

  19. Wan, L., Ng, W.K., Dang, X.H., Yu, P.S., Zhang, K.: Density-based clustering of data streams at multiple resolutions. ACM Transactions Knowledge Discovery Data 3(3), 1–28 (2009)

    Google Scholar 

  20. Zhou, A., Cao, F., Qian, W., Jin, C.: Tracking clusters in evolving data streams over sliding windows. Knowledge and Information Systems 15, 181–214 (May 2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media Singapore

About this paper

Cite this paper

Amini, A., Saboohi, H., Wah, T.Y., Herawan, T. (2014). DMM-Stream: A Density Mini-Micro Clustering Algorithm for Evolving Data Streams. In: Herawan, T., Deris, M., Abawajy, J. (eds) Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013). Lecture Notes in Electrical Engineering, vol 285. Springer, Singapore. https://doi.org/10.1007/978-981-4585-18-7_76

Download citation

  • DOI: https://doi.org/10.1007/978-981-4585-18-7_76

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-4585-17-0

  • Online ISBN: 978-981-4585-18-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics