skip to main content
10.1145/1982185.1982405acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

L2GClust: local-to-global clustering of stream sources

Published: 21 March 2011 Publication History

Abstract

In ubiquitous streaming data sources, such as sensor networks, clustering nodes by the data they produce is an important problem that gives insights on the phenomenon being monitored by such networks. However, if these techniques require data to be gathered centrally, communication and storage requirements are often unbounded. The goal of this paper is to assess the feasibility of computing local clustering at each node, using only neighbors' centroids, as an approximation of the global clustering computed by a centralized process. A local algorithm is proposed to perform clustering of sensors based on the moving average of each node's data over time: the moving average of each node is approximated using memory-less fading average; clustering is based on the furthest point algorithm applied to the centroids computed by the node's direct neighbors. The algorithm was evaluated on a state-of-the-art sensor network simulator, measuring the agreement between local and global clustering. Experimental work on synthetic data with spherical Gaussian clusters is consistently analyzed for different network size, number of clusters and cluster overlapping. Results show a high level of agreement between each node's clustering definitions and the global clustering definition, with special emphasis on separability agreement. Overall, local approaches are able to keep a good approximation of the global clustering, improving privacy among nodes, and decreasing communication and computation load in the network. Hence, the basic requirements for distributed clustering of streaming data sensors recommend that clustering on these settings should be performed locally.

References

[1]
C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A framework for clustering evolving data streams. In Procs of the 29th Int Conf on Very Large Data Bases, pages 81--92. Morgan Kaufmann, September 2003.
[2]
I. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci. A Survey on Sensor Networks. IEEE Communications Magazine, 40(8): 102--114, 2002.
[3]
P. Baldwin, S. Kohli, E. A. Lee, X. Liu, and Y. Zhao. Modelling of Sensor Nets in Ptolemy II. In Procs of the 3rd Int Symp on Information Processing in Sensor Networks, pages 359--368. ACM Press, 2004.
[4]
S. Bandyopadhyay, C. Giannella, U. Maulik, H. Kargupta, K. Liu, and S. Datta. Clustering distributed data streams in peer-to-peer environments. Information Sciences, 176(14): 1952--1985, 2006.
[5]
D. Barbará. Requirements for clustering data streams. SIGKDD Explorations, 3(2): 23--27, January 2002.
[6]
J. Beringer and E. Hüllermeier. Online clustering of parallel data streams. Data and Knowledge Engineering, 58(2): 180--204, August 2006.
[7]
H. Chan, M. Luk, and A. Perrig. Using clustering information for sensor network localization. In Procs of the 1st IEEE International Conference on Distributed Computing in Sensor Systems, pages 109--125, 2005.
[8]
J. Cohen. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20: 37--46, 1960.
[9]
G. Cormode, S. Muthukrishnan, and W. Zhuang. Conquering the divide: Continuous clustering of distributed data streams. In Procs of the 23rd Int Conf on Data Engineering, pages 1036--1045, 2007.
[10]
S. Datta, K. Bhaduri, C. Giannella, R. Wolff, and H. Kargupta. Distributed data mining in peer-to-peer networks. IEEE Internet Computing, 10(4): 18--26, 2006.
[11]
P. Domingos and G. Hulten. A general method for scaling up machine learning algorithms and its application to clustering. In Procs of the 18th Int Conf on Machine Learning, pages 106--113, 2001.
[12]
M. M. Gaber and P. S. Yu. A framework for resource-aware knowledge discovery in data streams: a holistic approach with its application to clustering. In Procs of the ACM Symposium on Applied Computing, pages 649--656, 2006.
[13]
J. Gama and P. P. Rodrigues. Data stream processing. In Learning from Data Streams - Processing Techniques in Sensor Networks, chapter 3, pages 25--39. Springer Verlag, 2007.
[14]
J. Gama, R. Sebastião, and P. P. Rodrigues. Issues in evaluation of stream learning algorithms. In Procs of the 15th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, pages 329--337, Paris, France, 2009. ACM Press.
[15]
T. F. Gonzalez. Clustering to minimize the maximum inter-cluster distance. Theoretical Computer Science, 38: 293--306, 1985.
[16]
M. Halkidi, Y. Batistakis, and M. Varzirgiannis. On clustering validation techniques. Journal of Intelligent Information Systems, 17(2--3): 107--145, 2001.
[17]
A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice-Hall, 1988.
[18]
H. Kargupta, W. Huang, K. Sivakumar, and E. L. Johnson. Distributed clustering using collective principal component analysis. Knowledge and Information Systems, 3(4): 422--448, 2001.
[19]
M. Klusch, S. Lodi, and G. Moro. Distributed clustering based on sampling local density estimates. In Procs of the International Joint Conference on Artificial Intelligence, pages 485--490, 2003.
[20]
S. Muthukrishnan. Data Streams: Algorithms and Applications. Now Publishers Inc., New York, NY, 2005.
[21]
P. P. Rodrigues and J. Gama. Clustering techniques in sensor networks. In Learning from Data Streams, chapter 9, pages 125--142. Springer Verlag, 2007.
[22]
P. P. Rodrigues and J. Gama. A system for analysis and prediction of electricity load streams. Intelligent Data Analysis, 13(3): 477--496, June 2009.
[23]
P. P. Rodrigues, J. Gama, and L. Lopes. Requirements for clustering streaming sensors. In Knowledge Discovery from Sensor Data, chapter 4, pages 33--51. CRC Press, 2008.
[24]
P. P. Rodrigues, J. Gama, and L. Lopes. Knowledge discovery for sensor network comprehension. In Intelligent Techniques for Warehousing and Mining Sensor Network Data, chapter 6, pages 118--135. IGI Global, 2010.
[25]
P. P. Rodrigues, J. Gama, and J. P. Pedroso. Hierarchical clustering of time-series data streams. IEEE Transactions on Knowledge and Data Engineering, 20(5): 615--627, May 2008.
[26]
D. M. Sherrill, M. L. Moy, J. J. Reilly, and P. Bonato. Using hierarchical clustering methods to classify motor activities of copd patients from wearable sensor data. Journal of Neuroengineering and Rehabilitation, 2(16), 2005.
[27]
J.-Z. Sun and J. Sauvola. Towards advanced modeling techniques for wireless sensor networks. In Procs of the 1st Int Symp on Pervasive Computing and Applications, pages 133--138. IEEE Press, 2006.
[28]
M. J. Warrens. On the equivalence of cohen's kappa and the hubert-arabie adjusted rand index. Journal of Classification, 25(2): 177--183, November 2008.
[29]
J. Yin and M. M. Gaber. Clustering distributed time series in sensor networks. Procs of the 8th IEEE Int Conf on Data Mining, pages 678--687, 2008.
[30]
K. Zhang, K. Torkkola, H. Li, C. Schreiner, H. Zhang, M. Gardner, and Z. Zhao. A context aware automatic traffic notification system for cell phones. In Procs of the 27th Int Conf on Distributed Computing Systems Workshops, pages 48--50. IEEE Press, 2007.

Cited By

View all
  • (2015)An Online Learning-Based Adaptive Biometric SystemAdaptive Biometric Systems10.1007/978-3-319-24865-3_5(73-96)Online publication date: 22-Oct-2015
  • (2012)Mobile Data Stream MiningProceedings of the 2012 IEEE 13th International Conference on Mobile Data Management (mdm 2012)10.1109/MDM.2012.37(360-363)Online publication date: 23-Jul-2012

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '11: Proceedings of the 2011 ACM Symposium on Applied Computing
March 2011
1868 pages
ISBN:9781450301138
DOI:10.1145/1982185
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 March 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. clustering sources
  2. local algorithms
  3. ubiquitous streams

Qualifiers

  • Research-article

Funding Sources

Conference

SAC'11
Sponsor:
SAC'11: The 2011 ACM Symposium on Applied Computing
March 21 - 24, 2011
TaiChung, Taiwan

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25
The 40th ACM/SIGAPP Symposium on Applied Computing
March 31 - April 4, 2025
Catania , Italy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2015)An Online Learning-Based Adaptive Biometric SystemAdaptive Biometric Systems10.1007/978-3-319-24865-3_5(73-96)Online publication date: 22-Oct-2015
  • (2012)Mobile Data Stream MiningProceedings of the 2012 IEEE 13th International Conference on Mobile Data Management (mdm 2012)10.1109/MDM.2012.37(360-363)Online publication date: 23-Jul-2012

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media