Skip to main content

Advertisement

Log in

A clustering approach for sampling data streams in sensor networks

  • Regular paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

The growing usage of embedded devices and sensors in our daily lives has been profoundly reshaping the way we interact with our environment and our peers. As more and more sensors will pervade our future cities, increasingly efficient infrastructures to collect, process and store massive amounts of data streams from a wide variety of sources will be required. Despite the different application-specific features and hardware platforms, sensor network applications share a common goal: periodically sample and store data collected from different sensors in a common persistent memory. In this article, we present a clustering approach for rapidly and efficiently computing the best sampling rate which minimizes the Sum of Square Error for each particular sensor in a network. In order to evaluate the efficiency of the proposed approach, we carried out experiments on real electric power consumption data streams provided by EDF (Électricité de France).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aggarwal CC (2010) A segment-based framework for modeling and mining data streams. In: Knowledge and information systems, pp 1–29. Springer

  2. Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: Proceedings of the 29th international conference on very large data bases (VLDB’2003), pp 81–92

  3. Bash BA, Byers JW, Considine J (2004) Approximately uniform random sampling in sensor networks. In: Proceeedings of the 1st international workshop on Data management for sensor networks (DMSN’04), pp 32–39 (Toronto)

  4. Bellman RE (1961) On the approximation of curves by line segments using dynamic programming. Commun ACM 6(4): 284

    Article  Google Scholar 

  5. Broder AZ, Charikar M, Frieze AM, Mitzenmacher M (1998) Min-wise independent permutations. In: Proceedings of the 30th annual ACM symposium on theory of computing (STOC’98), pp 327–336. Dallas

  6. Chiky R, Hebrail G (2008) Summarizing distributed data streams for storage in datawarehouses. In: Proceedings of the 10th international conference on data warehousing and knowledge discovery (DaWaK 2008), pp 65–74

  7. Cormode G, Garofalakis M (2008) Approximate continuous querying over distributed streams. ACM Trans Database Syst 33(2): 1–39

    Article  Google Scholar 

  8. Csernel B, Clerot F, Hebrail G (2006) Streamsamp: datastream clustering over tilted windows through sampling. In: ECML PKDD 2006 Workshop on knowledge discovery from data streams, Berlin

  9. Grossi V, Turini F (2011) Stream mining: a novel architecture for ensemble-based classification. In: Knowledge and information systems, pp 1–35. Springer

  10. Hartl G, Li B (2005) Infer: a Bayesian inference approach towards energy efficient data collection in dense sensor networks. In: International conference on distributed computing systems, pp 371–380

  11. Hugueney B (2003) Reprsentation symbolique de courbes numriques. PhD thesis, University of Paris VI

  12. Jain A, Chang EY (2004) Adaptive sampling for sensor networks. In: DMSN’04: Proceeedings of the 1st international workshop on data management for sensor networks. Toronto, pp 10–16

  13. Karabulut K, Alkan A, Yilmaz AS (2008) Long term energy consumption forecasting using genetic programming. Math Comput Appl 13(2): 71–80

    Google Scholar 

  14. Keogh E, Chu S, Hart D, Pazzani M (2001) An online algorithm for segmenting time series. In: Proceedings of IEEE international conference on data mining, pp 289–296

  15. Khalilian M, Mustapha N (2010) Data stream clustering: challenges and issues. In: The 2010 IAENG international conference on data mining and applications, Hong Kong

  16. Kranen P, Assent I, Baldauf C, Seidl T (2010) The ClusTree: indexing micro-clusters for anytime stream mining. In: Knowledge and information systems, pp 1–24. Springer

  17. Land A, Doig AG (1960) An automatic method for solving discrete programming problems. Econometrica 28(3): 497–520

    Article  MathSciNet  MATH  Google Scholar 

  18. Liu C, Wu K, Tsao M (2005) Energy efficient information collection with the arima model in wireless sensor networks. In: IEEE global telecommunications conference (GLOBECOM-05), vol 5, pp 2470–2474

  19. Mahdiraji AR (2009) Clustering data stream: a survey of algorithms. Int J Knowl Based Intell Eng Syst 13(2): 39–44

    Google Scholar 

  20. Marbini AD, Sacks LE (2003) Adaptive sampling mechanisms in sensor networks. London Communications Symposium, London

    Google Scholar 

  21. Nath S, Gibbons PB, Seshan S, Anderson Z (2008) Synopsis diffusion for robust aggregation in sensor networks. ACM Trans Sens Netw 4(2): 1–40

    Article  Google Scholar 

  22. O’Callaghan L, Mishra N, Meyerson A, Guha S, Motwani R (2001) Streaming-data algorithms for high-quality clustering. In: Proceedings of IEEE international conference on data engineering, pp 685–694

  23. Tatbul N (2010) Streaming data integration: challenges and opportunities. In: IEEE ICDE international workshop on new trends in information integration (NTII’10), Long Beach, pp 155–158

  24. Tille Y (2006) Sampling algorithms. Springer series in statistics

  25. Willett R, Martin A, Nowak R (2004) Backcasting: adaptive sampling for sensor networks. In: Proceedings of the 3rd international symposium on Information processing in sensor networks (IPSN’04). Houston, pp 124–133

  26. Zhang T, Ramakrishnan R, Livny M (1997) Birch: a new data clustering algorithm and its applications. Data Min Knowl Discov 1(2): 141–182

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alzennyr da Silva.

Rights and permissions

Reprints and permissions

About this article

Cite this article

da Silva, A., Chiky, R. & Hébrail, G. A clustering approach for sampling data streams in sensor networks. Knowl Inf Syst 32, 1–23 (2012). https://doi.org/10.1007/s10115-011-0448-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-011-0448-7

Keywords

Navigation