Skip to main content

A Density-Grid Based Clustering Algorithm on Data Stream Using Resilient Distributed Datasets

  • Conference paper
  • First Online:
Advances in Artificial Intelligence (Canadian AI 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9673))

Included in the following conference series:

Abstract

To find the clusters of arbitrary shapes rapidly from the sustainable growth of data stream, this paper proposes GDRDD-Stream algorithm. To capture the evolving characteristics of the data stream, this paper defines the effective time for the data points and design the eliminating strategy based on the effective time to remove the historical data. Secondly, we design the partitioning method based on resilient distributed datasets to balance the computing load between different nodes. Finally, we improve the traditional DBSCAN algorithm in order to compute in parallel between different partitions. The experimental results show that the proposed algorithm can cluster data stream distributed in arbitrary shape rapidly, capture the evolving behaviors of data stream, and its performance and quality are better than the CluStream algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Amineh, A., Teh, Y.W., Mahmoud, R., et al.: A study of density-grid based clustering algorithms on data streams. In: Proceeding of Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), pp. 1652–1656 (2011)

    Google Scholar 

  2. Jonathan, A.S., Elaine, R.F., Rodrigo, C.: Data stream clustering: a survey. ACM Comput. Surv. 46(1), 13:1–13:31 (2013)

    MATH  Google Scholar 

  3. Shifei, D., Fulin, W., Jun, Q., Hongjie, J., Fengxiang, J.: Research on data stream clustering algorithms. Artif. Intell. Rev. 43, 593–600 (2015)

    Article  Google Scholar 

  4. Jeffrey, D., Sanjay, G.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  5. Guha, S., Mishra, N., Motwani, R.: Clustering data streams. In: Proceeding(s) of 41st Annual Symposium on Foundations of Computer Science, pp. 359–366 (2000)

    Google Scholar 

  6. Ocallaghan, L., Meyerson, A., Motwani, R., Mishar, N., Gha, S.: Streaming data algorithms for high-quality clustering. In: Proceeding(s) of 18th International Conference, Data Engineering, pp. 685–704 (2002)

    Google Scholar 

  7. Aggarwal, C., Han, J., Wang, J., Yu, P.: A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases, pp. 81–92 (2003)

    Google Scholar 

  8. Chen, Y., Tu, L.: Density-based clustering for real-time data stream. In: Proceeding of the ACM KDD 2007 Conference, pp. 133–142 (2002)

    Google Scholar 

  9. Matei, Z., Mosharaf, C., Tathagata, D., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, p. 2 (2012)

    Google Scholar 

  10. Martin, E., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of 2nd International Conference Knowledge Discovery and Data Mining (KDD 1996), pp. 226–231 (1996)

    Google Scholar 

  11. KDD dataset. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuan Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Zhang, Y., Zhang, J. (2016). A Density-Grid Based Clustering Algorithm on Data Stream Using Resilient Distributed Datasets. In: Khoury, R., Drummond, C. (eds) Advances in Artificial Intelligence. Canadian AI 2016. Lecture Notes in Computer Science(), vol 9673. Springer, Cham. https://doi.org/10.1007/978-3-319-34111-8_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-34111-8_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-34110-1

  • Online ISBN: 978-3-319-34111-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics