Skip to main content
Log in

Distributed data clustering in sensor networks

  • Published:
Distributed Computing Aims and scope Submit manuscript

Abstract

Low overhead analysis of large distributed data sets is necessary for current data centers and for future sensor networks. In such systems, each node holds some data value, e.g., a local sensor read, and a concise picture of the global system state needs to be obtained. In resource-constrained environments like sensor networks, this needs to be done without collecting all the data at any location, i.e., in a distributed manner. To this end, we address the distributed clustering problem, in which numerous interconnected nodes compute a clustering of their data, i.e., partition these values into multiple clusters, and describe each cluster concisely. We present a generic algorithm that solves the distributed clustering problem and may be implemented in various topologies, using different clustering types. For example, the generic algorithm can be instantiated to cluster values according to distance, targeting the same problem as the famous k-means clustering algorithm. However, the distance criterion is often not sufficient to provide good clustering results. We present an instantiation of the generic algorithm that describes the values as a Gaussian Mixture (a set of weighted normal distributions), and uses machine learning tools for clustering decisions. Simulations show the robustness, speed and scalability of this algorithm. We prove that any implementation of the generic algorithm converges over any connected topology, clustering criterion and cluster representation, in fully asynchronous settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Asada, G., Dong, M., Lin, T., Newberg, F., Pottie, G., Kaiser, W., Marcy, H.: Wireless integrated network sensors: low power systems on a chip. In: ESSCIRC, Elsevier, Den Hague (1998)

  2. Birk, Y., Liss, L., Schuster, A., Wolff, R.: A local algorithm for ad hoc majority voting via charge fusion. In: DISC, Springer, Heidelberg (2004)

  3. Boyd, S.P., Ghosh, A., Prabhakar, B., Shah, D.: Gossip algorithms: design, analysis and applications. In: INFOCOM, IEEE, Miami (2005)

  4. Datta, S., Giannella, C., Kargupta, H.: K-means clustering over a large, dynamic network. In: SDM, SIAM (2006)

  5. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. Royal Stat. Soc. 39(1) 1–38 (1977). http://www.jstor.org/stable/2984875

  6. Duda R.O., Hart P.E., Stork D.G.: Pattern Classification, 2nd edn. Wiley-Interscience, New York (2000)

    Google Scholar 

  7. Eugster P.T., Guerraoui R., Handurukande S.B., Kouznetsov P., Kermarrec A.-M.: Lightweight probabilistic broadcast. ACM Trans. Comput. Syst. 21(4), 341–374 (2003)

    Article  Google Scholar 

  8. Eyal, I., Keidar, I., Rom, R.: Distributed clustering for robust aggregation in large networks. In: HotDep, IEEE (2009)

  9. Eyal, I., Keidar, I., Rom, R.: Distributed data classification in sensor networks. In: PODC, ACM (2010)

  10. Flajolet P., Martin G.N.: Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci. 31(2), 182–209 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  11. Gurevich M., Keidar I.: Correctness of gossip-based membership under message loss. SIAM J. Comput. 39(8), 3830–3859 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  12. Haridasan, M., van Renesse, R.: Gossip-based distribution estimation in peer-to-peer networks. In: International Workshop on Peer-to-Peer Systems (IPTPS 08) (2008)

  13. Heller J.: Catch-22. Simon & Schuster, New York (1961)

    Google Scholar 

  14. Kempe, D., Dobra, A., Gehrke, J.: Gossip-based computation of aggregate information. In: FOCS, IEEE Computer Society, Los Alamitos (2003)

  15. Kowalczyk, W., Vlassis, N.A.: Newscast em. In: NIPS (2004)

  16. Macqueen, J.B.: Some methods of classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (1967)

  17. Mark Jelasity, M., Voulgaris, S., Guerraoui, R., Kermarrec, A.M., van Steen, M.: Gossip based peer sampling. ACM Trans. Comput. Syst. 25(3) (2007)

  18. Nath, S., Gibbons, P.B., Seshan, S., Anderson, Z.R.: Synopsis diffusion for robust aggregation in sensor networks. In: SenSys, ACM, New York (2004)

  19. Sacha, J., Napper, J., Stratan, C., Pierre, G.: Reliable distribution estimation in decentralised environments. Submitted for Publication (2009)

  20. Salmond, D.J.: Mixture reduction algorithms for uncertain tracking. Tech. rep., RAE Farnborough (UK) (1988)

  21. Warneke, B., Last, M., Liebowitz, B., Pister, K.: Smart dust: communicating with a cubic-millimeter computer. Computer 34(1) (2001). doi:10.1109/2.895117

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ittay Eyal.

Additional information

A preliminary version of this paper appears in the proceedings of the 29th Symposium on Principles of Distributed Computing (PODC) [9].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Eyal, I., Keidar, I. & Rom, R. Distributed data clustering in sensor networks. Distrib. Comput. 24, 207–222 (2011). https://doi.org/10.1007/s00446-011-0143-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00446-011-0143-7

Keywords

Navigation