Skip to main content

DistClusTree: A Framework for Distributed Stream Clustering

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10837))

Abstract

In this paper, we investigate the problem of clustering distributed multidimensional data streams. We devise a distributed clustering framework DistClusTree that extends the centralized ClusTree approach. The main difficulty in distributed clustering is balancing communication cost and clustering quality. We tackle this in DistClusTree through combining spatial index summaries and online tracking for efficient local and global incremental clustering. We demonstrate through extensive experiments the efficacy of the framework in terms of communication cost and approximate clustering quality.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)

    Google Scholar 

  2. Cormode, G., Muthukrishnan, S., Zhuang, W.: Conquering the divide: continuous clustering of distributed data streams. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 1036–1045, April 2007

    Google Scholar 

  3. Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD 1996, pp. 226–231. AAAI Press (1996)

    Google Scholar 

  4. Guttman, A.: R-trees: a dynamic index structure for spatial searching, vol. 14. ACM (1984)

    Article  Google Scholar 

  5. Januzaj, E., Kriegel, H.-P., Pfeifle, M.: Towards effective and efficient distributed clustering. In: Workshop on Clustering Large Data Sets ICDM, pp. 49–58 (2003)

    Google Scholar 

  6. Kargupta, H., Huang, W., Sivakumar, K., Johnson, E.: Distributed clustering using collective principal component analysis. Knowl. Inf. Syst. 3, 2001 (1999)

    MATH  Google Scholar 

  7. Klusch, M., Lodi, S., Moro, G.: Distributed clustering based on sampling local density estimates. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence, IJCAI 2003, pp. 485–490. Morgan Kaufmann Publishers Inc., San Francisco (2003)

    Google Scholar 

  8. Kranen, P., Assent, I., Baldauf, C., Seidl, T.: The clustree: indexing micro-clusters for anytime stream mining. Knowl. Inf. Syst. 29(2), 249–272 (2011)

    Article  Google Scholar 

  9. Rodrigues, P.P., Gama, J.: Distributed clustering of ubiquitous data streams. Wiley Interdisc. Rev. Data Mining Knowl. Disc. 4(01), 38–54 (2014)

    Article  Google Scholar 

  10. White, D.A., Jain, R.: Similarity indexing with the SS-tree. In: Proceedings of the Twelfth International Conference on Data Engineering, pp. 516–523, February 1996

    Google Scholar 

  11. Yi, K., Zhang, Q.: Multidimensional online tracking. ACM Trans. Algorithms (TALG) 8(2), 12 (2012)

    MathSciNet  MATH  Google Scholar 

  12. Zhou, A., Cao, F., Yan, Y., Sha, C., He, X.: Distributed data stream clustering: a fast EM-based approach. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 736–745, April 2007

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhinoos Razavi Hesabi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Razavi Hesabi, Z., Sellis, T., Liao, K. (2018). DistClusTree: A Framework for Distributed Stream Clustering. In: Wang, J., Cong, G., Chen, J., Qi, J. (eds) Databases Theory and Applications. ADC 2018. Lecture Notes in Computer Science(), vol 10837. Springer, Cham. https://doi.org/10.1007/978-3-319-92013-9_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-92013-9_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-92012-2

  • Online ISBN: 978-3-319-92013-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics