Abstract
The growing usage of embedded devices and sensors in our daily lives has been profoundly reshaping the way we interact with our environment and our peers. As more and more sensors will pervade our future cities, increasingly efficient infrastructures to collect, process and store massive amounts of data streams from a wide variety of sources will be required. Despite the different application-specific features and hardware platforms, sensor network applications share a common goal: periodically sample and store data collected from different sensors in a common persistent memory. In this article, we present a clustering approach for rapidly and efficiently computing the best sampling rate which minimizes the Sum of Square Error for each particular sensor in a network. In order to evaluate the efficiency of the proposed approach, we carried out experiments on real electric power consumption data streams provided by EDF (Électricité de France).
Similar content being viewed by others
References
Aggarwal CC (2010) A segment-based framework for modeling and mining data streams. In: Knowledge and information systems, pp 1–29. Springer
Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: Proceedings of the 29th international conference on very large data bases (VLDB’2003), pp 81–92
Bash BA, Byers JW, Considine J (2004) Approximately uniform random sampling in sensor networks. In: Proceeedings of the 1st international workshop on Data management for sensor networks (DMSN’04), pp 32–39 (Toronto)
Bellman RE (1961) On the approximation of curves by line segments using dynamic programming. Commun ACM 6(4): 284
Broder AZ, Charikar M, Frieze AM, Mitzenmacher M (1998) Min-wise independent permutations. In: Proceedings of the 30th annual ACM symposium on theory of computing (STOC’98), pp 327–336. Dallas
Chiky R, Hebrail G (2008) Summarizing distributed data streams for storage in datawarehouses. In: Proceedings of the 10th international conference on data warehousing and knowledge discovery (DaWaK 2008), pp 65–74
Cormode G, Garofalakis M (2008) Approximate continuous querying over distributed streams. ACM Trans Database Syst 33(2): 1–39
Csernel B, Clerot F, Hebrail G (2006) Streamsamp: datastream clustering over tilted windows through sampling. In: ECML PKDD 2006 Workshop on knowledge discovery from data streams, Berlin
Grossi V, Turini F (2011) Stream mining: a novel architecture for ensemble-based classification. In: Knowledge and information systems, pp 1–35. Springer
Hartl G, Li B (2005) Infer: a Bayesian inference approach towards energy efficient data collection in dense sensor networks. In: International conference on distributed computing systems, pp 371–380
Hugueney B (2003) Reprsentation symbolique de courbes numriques. PhD thesis, University of Paris VI
Jain A, Chang EY (2004) Adaptive sampling for sensor networks. In: DMSN’04: Proceeedings of the 1st international workshop on data management for sensor networks. Toronto, pp 10–16
Karabulut K, Alkan A, Yilmaz AS (2008) Long term energy consumption forecasting using genetic programming. Math Comput Appl 13(2): 71–80
Keogh E, Chu S, Hart D, Pazzani M (2001) An online algorithm for segmenting time series. In: Proceedings of IEEE international conference on data mining, pp 289–296
Khalilian M, Mustapha N (2010) Data stream clustering: challenges and issues. In: The 2010 IAENG international conference on data mining and applications, Hong Kong
Kranen P, Assent I, Baldauf C, Seidl T (2010) The ClusTree: indexing micro-clusters for anytime stream mining. In: Knowledge and information systems, pp 1–24. Springer
Land A, Doig AG (1960) An automatic method for solving discrete programming problems. Econometrica 28(3): 497–520
Liu C, Wu K, Tsao M (2005) Energy efficient information collection with the arima model in wireless sensor networks. In: IEEE global telecommunications conference (GLOBECOM-05), vol 5, pp 2470–2474
Mahdiraji AR (2009) Clustering data stream: a survey of algorithms. Int J Knowl Based Intell Eng Syst 13(2): 39–44
Marbini AD, Sacks LE (2003) Adaptive sampling mechanisms in sensor networks. London Communications Symposium, London
Nath S, Gibbons PB, Seshan S, Anderson Z (2008) Synopsis diffusion for robust aggregation in sensor networks. ACM Trans Sens Netw 4(2): 1–40
O’Callaghan L, Mishra N, Meyerson A, Guha S, Motwani R (2001) Streaming-data algorithms for high-quality clustering. In: Proceedings of IEEE international conference on data engineering, pp 685–694
Tatbul N (2010) Streaming data integration: challenges and opportunities. In: IEEE ICDE international workshop on new trends in information integration (NTII’10), Long Beach, pp 155–158
Tille Y (2006) Sampling algorithms. Springer series in statistics
Willett R, Martin A, Nowak R (2004) Backcasting: adaptive sampling for sensor networks. In: Proceedings of the 3rd international symposium on Information processing in sensor networks (IPSN’04). Houston, pp 124–133
Zhang T, Ramakrishnan R, Livny M (1997) Birch: a new data clustering algorithm and its applications. Data Min Knowl Discov 1(2): 141–182
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
da Silva, A., Chiky, R. & Hébrail, G. A clustering approach for sampling data streams in sensor networks. Knowl Inf Syst 32, 1–23 (2012). https://doi.org/10.1007/s10115-011-0448-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-011-0448-7