Skip to main content

Algorithms for k-median Clustering over Distributed Streams

  • Conference paper
  • First Online:
Computing and Combinatorics (COCOON 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9797))

Included in the following conference series:

  • 908 Accesses

Abstract

We consider the k-median clustering problem over distributed streams. In the distributed streaming setting there are multiple computational nodes where each node receives a data stream and the goal is to maintain an approximation of a function of interest at all time over the union of the local data at all the nodes. The approximation is maintained at a coordinator node which has bidirectional communication channels to all the nodes. This model is also known as the distributed functional monitoring model. A natural variant of this model is the distributed sliding window model where we are interested only in maintaining approximation over a recent period of time.

This paper gives new algorithms for the k-median clustering problem in the distributed streaming model and its sliding-window counter part.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Throughout this paper with high probability means with probability at least \((1-\frac{1}{n})\).

References

  1. Braverman, V., Lang, H., Levin, K., Monemizadeh, M.: Clustering problems on sliding windows. In: Krauthgamer, R. (ed.) Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, Arlington, VA, USA, 10–12 January 2016, pp. 1374–1390. SIAM (2016)

    Google Scholar 

  2. Braverman, V., Meyerson, A., Ostrovsky, R., Roytman, A., Shindler, M., Tagiku, B.: Streaming k-means on well-clusterable data. In: Randall, D. (ed.) Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2011, San Francisco, California, USA, 23–25 January 2011, pp. 26–40. SIAM (2011)

    Google Scholar 

  3. Chan, H.-L., Lam, T.W., Lee, L.-K., Ting, H.-F.: Continuous monitoring of distributed data streams over a time-based sliding window. In: Marion, J.-Y., Schwentick, T. (eds.) 27th International Symposium on Theoretical Aspects of Computer Science, STACS 2010, Nancy, France, 4–6 March 2010. LIPIcs, vol. 5, pp. 179–190. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2010)

    Google Scholar 

  4. Charikar, M., O’Callaghan, L., Panigrahy, R.: Better streaming algorithms for clustering problems. In: Larmore, L.L., Goemans, M.X. (eds.) Proceedings of the 35th Annual ACM Symposium on Theory of Computing, 9–11 June 2003, San Diego, CA, USA, pp. 30–39. ACM (2003)

    Google Scholar 

  5. Cormode, G.: Algorithms for continuous distributing monitoring: a survey. In: Laura, L., Querzoni, L. (eds.) First International Workshop on Algorithms and Models for Distributed Event Processing 2011, Proceedings, Rome, Italy, 19 September 2011. ACM International Conference Proceeding Series, vol. 585, pp. 1–10. ACM (2011)

    Google Scholar 

  6. Cormode, G., Muthukrishnan, S., Yi, K.: Algorithms for distributed functional monitoring. In: Teng, S.-H. (ed.) Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2008, San Francisco, California, USA, 20–22 January 2008, pp. 1076–1085. SIAM (2008)

    Google Scholar 

  7. Cormode, G., Muthukrishnan, S., Yi, K., Zhang, Q.: Optimal sampling from distributed streams. In: Paredaens, J., Van Gucht, D. (eds.) Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2010, Indianapolis, Indiana, USA, 6-11 June 2010, pp. 77–86. ACM (2010)

    Google Scholar 

  8. Cormode, G., Muthukrishnan, S., Zhuang, W.: Conquering the divide: continuous clustering of distributed data streams. In: Chirkova, R., Dogac, A., Tamer Özsu, M., Sellis, T.K. (eds.) Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007, The Marmara Hotel, Istanbul, Turkey, 15-20 April 2007, pp. 1036–1045. IEEE (2007)

    Google Scholar 

  9. Cormode, G., Yi, K.: Tracking distributed aggregates over time-based sliding windows. In: Ailamaki, A., Bowers, S. (eds.) SSDBM 2012. LNCS, vol. 7338, pp. 416–430. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  10. Frahling, G., Sohler, C.: Coresets in dynamic geometric data streams. In: Gabow, H.N., Fagin, R. (eds.) Proceedings of the 37th Annual ACM Symposium on Theory of Computing, Baltimore, MD, USA, 22–24 May 2005, pp. 209–217. ACM (2005)

    Google Scholar 

  11. Har-Peled, S., Mazumdar, S.: On coresets for k-means and k-median clustering. In: Babai, L. (ed.) Proceedings of the 36th Annual ACM Symposium on Theory of Computing, Chicago, IL, USA, 13–16 June 2004, pp. 291–300. ACM (2004)

    Google Scholar 

  12. Jain, K., Mahdian, M., Saberi, A.: A new greedy approach for facility location problems. In: Reif, J.H. (ed.) Proceedings on 34th Annual ACM Symposium on Theory of Computing, Montréal, Québec, Canada, 19–21 May 2002, pp. 731–740. ACM (2002)

    Google Scholar 

  13. Keralapura, R., Cormode, G., Ramamirtham, J.: Communication-efficient distributed monitoring of thresholded counts. In: Chaudhuri, S., Hristidis, V., Polyzotis, N. (eds.) Proceedings of the ACM SIGMOD International Conference on Management of Data, Chicago, Illinois, USA, 27–29 June 2006, pp. 289–300. ACM (2006)

    Google Scholar 

  14. Meyerson, A.: Online facility location. In: 42nd Annual Symposium on Foundations of Computer Science, FOCS 2001, Las Vegas, Nevada, USA, 14–17 October 2001, pp. 426–431. IEEE Computer Society (2001)

    Google Scholar 

  15. Woodruff, D.P., Zhang, Q.: Tight bounds for distributed functional monitoring. In: Karloff, H.J., Pitassi, T. (eds.) Proceedings of the 44th Symposium on Theory of Computing Conference, STOC 2012, New York, NY, USA, 19–22 May 2012, pp. 941–960. ACM (2012)

    Google Scholar 

  16. Zhang, Q., Liu, J., Wang, W.: Approximate clustering on distributed data streams. In: Alonso, G., Blakeley, J.A., Chen, A.L.P. (eds.) Proceedings of the 24th International Conference on Data Engineering, ICDE 2008, Cancún, México, 7–12 April 2008, pp. 1131–1139. IEEE (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sutanu Gayen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Gayen, S., Vinodchandran, N.V. (2016). Algorithms for k-median Clustering over Distributed Streams. In: Dinh, T., Thai, M. (eds) Computing and Combinatorics . COCOON 2016. Lecture Notes in Computer Science(), vol 9797. Springer, Cham. https://doi.org/10.1007/978-3-319-42634-1_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-42634-1_43

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-42633-4

  • Online ISBN: 978-3-319-42634-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics