Skip to main content
Log in

A single pass algorithm for clustering evolving data streams based on swarm intelligence

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Existing density-based data stream clustering algorithms use a two-phase scheme approach consisting of an online phase, in which raw data is processed to gather summary statistics, and an offline phase that generates the clusters by using the summary data. In this article we propose a data stream clustering method based on a multi-agent system that uses a decentralized bottom-up self-organizing strategy to group similar data points. Data points are associated with agents and deployed onto a 2D space, to work simultaneously by applying a heuristic strategy based on a bio-inspired model, known as flocking model. Agents move onto the space for a fixed time and, when they encounter other agents into a predefined visibility range, they can decide to form a flock if they are similar. Flocks can join to form swarms of similar groups. This strategy allows to merge the two phases of density-based approaches and thus to avoid the computing demanding offline cluster computation, since a swarm represents a cluster. Experimental results show that the bio-inspired approach can obtain very good results on real and synthetic data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aggarwal, CC (ed) (2007) Data streams—models and algorithms. Springer, Boston

    MATH  Google Scholar 

  • Aggarwal CC, Han J, Wang J, Yu P (2003) A framework for clustering evolving data streams. In Proceedings of 29th international conference on very large data bases (VLDB’03). Morgan Kaufmann, San Francisco, pp 81–92

  • Aggarwal CC, Han J, Wang J, Yu P (2006) On clustering massive data streams: a summarization paradigm. In: Aggarwal CC (ed) Data streams—models and algorithms. Springer, Boston, pp 11–38

    Google Scholar 

  • Azzag H, Monmarché N, Slimane M, Guinot C, Venturini G (2003) AntTree: a new model for clustering with artificial ants. In: Banzhaf W, Christaller T, Dittrich P, Kim JT, Ziegler J (eds) Advances in artificial life—Proceedings of the 7th European conference on artificial life (ECAL). Lecture notes in artificial intelligence, vol 2801. Springer, Berlin, pp 564–571

  • Babock B, Datar M, Motwani R, O’Callaghan L (2003) Maintaining variance and k-medians over data stream windows. In: Proceedings of the 22nd ACM symposium on principles of data base systems (PODS 2003), San Diego, pp 234–243

  • Barbará D (2002) Requirements for clustering data streams. SIGKDD Explor Newslett 3(2): 23–27

    Article  Google Scholar 

  • Beringher J, Hullermeier E (2006) Online clustering of parallel data streams. Data Knowl Eng 58(2): 180–204

    Article  Google Scholar 

  • Cao F, Ester M, Qian W, Zhou A (2006) Density-based clustering over evolving data stream with noise. In: Proceedings of the sixth SIAM international conference on data mining (SIAM’06), Bethesda, pp 326–337

  • Charikar M, O’Callaghan L, Panigrahy R (2003) Better streaming algorithms for clustering problems. In: Proceedings of the 35th annual ACM symposium on theory of computing (STOC’03), San Diego, pp 30–39

  • Chen Y, Li T (2007) Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’07), ACM, New York, pp 133–142

  • Cui X, Potok TE (2006a) A distributed agent implementation of multiple species flocking model for document partitioning clustering. In: Cooperative information agents, Edinburgh, pp 124–137

  • Cui X, Potok TE (2006b) A distributed flocking approach for information stream clustering analysis. In: Proceedings of the ACIS international conference on software engineering, artificial intelligence, networking, and parallel/distributed computing (SNPD’06), Las Vegas, pp 97–102

  • Dai B, Huang J, Yeh M, Chen M (2006) Adaptive clustering for multiple evolving streams. IEEE Trans Knowl Data Eng 18(9): 1166–1180

    Article  Google Scholar 

  • Eberhart RC, Yuhui S, James K (2001) Swarm intelligence (the Morgan Kaufmann series in artificial intelligence). Morgan Kaufmann, San Francisco

    Google Scholar 

  • Ester M, Kriegel H-P, Jrg S, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second ACM SIGKDD international conference on knowledge discovery and data mining (KDD’96), Portland, pp 373–382

  • Folino G, Forestiero A, Spezzano G (2009) An adaptive flocking algorithm for performing approximate clustering. Inform Sci 179(18): 3059–3078

    Article  Google Scholar 

  • Guha S, Mishra N, Motwani R, O’Callaghan L (2000) Clustering data streams. In: Proceedings of the annual IEEE symposium on foundations of computer science, Redondo Beach, pp 359–366

  • Guha S, Meyerson A, Mishra N, Motwani R, O’Callaghan L (2003) Clustering data streams: theory and practise. IEEE Trans Knowl Data Eng 15(3): 515–528

    Article  Google Scholar 

  • Hamdi A, Monmarché N, Alimi A, Slimane M (2008) SwarmClass: a novel data clustering approach by a hybridization of an ant colony with flying insects. In: Dorigo M, Birattari M, Blum C, Clerc M, Stützle T, Winfield A (eds) Ant colony optimization and swarm intelligence—6th international conference, ANTS 2008. Lecture notes in computer science, vol 5217, September 22–24 2008. Springer, Berlin, pp 411–412

  • Handl J, Meyer B (2007) Ant-based and swarm-based clustering. Swarm Intell 1(2): 95–113

    Article  Google Scholar 

  • Li Tu, Chen Y (2009) Stream data clustering based on grid density and attractions. ACM Trans Knowl Discov Data 3(3): 12–11227

    Google Scholar 

  • Li W, Ng WK, Yu PS, Zhang K (2009) Density-based clustering of data streams at multiple resolutions. ACM Trans Knowl Discov Data 3(3): 14–11428

    Google Scholar 

  • Liu S, Dou Z-T, Li F, Huang Y-L (2004) A new ant colony clustering algorithm based on DBSCAN. In: 3rd international conference on machine learning and cybernetics, Shanghai, pp 1491–1496

  • Nasraoui O, Coronel CR (2006) Tecno-streams: tracking evolving clusters in noisy data streams with a scalable immune system learning model. In: Proceedings of the 6th SIAM international conference on data mining (SDM’06), Bethesda, pp 618–622

  • Nasraoui O, Uribe CC, Coronel CR, González FA (2003) Tecno-streams: tracking evolving clusters in noisy data streams with a scalable immune system learning model. In: Proceedings of the 3rd IEEE international conference on data mining (ICDM’03), Melbourne, pp 235–242

  • O’Callaghan L, Mishra N, Mishra N, Guha S (2002) Streaming-data algorithms for high quality clustering. In: Proceedings of the 18th international conference on data engineering (ICDE’01), San Jose, pp 685–694

  • Reynolds CW (1987) Flocks, herds and schools: a distributed behavioral model. In: SIGGRAPH ’87: Proceedings of the 14th annual conference on computer graphics and interactive techniques. ACM, New York, pp 25–34

  • Sanghamitra B, Giannella C, Maulik U, Kargupta H, Liu K, Datta S (2006) Clustering distributed data streams in peer-to-peer environments. Inform Sci 176(214): 1952–1985

    Google Scholar 

  • Tan, P-N, Steinbach, M, Kumar, V (eds) (2006) Introduction to data mining. Perason International Edition, Boston

    Google Scholar 

  • Wang Z, Wang B, Zhou C, Xu X (2004) Clustering data streams on the two-tier structure. In: Advanced Web technologies and applications. Springer, New York, pp 416–425

    Google Scholar 

  • Zhou A, Cao F, Qian W, Jin C (2007) Tracking clusters in evolving data streams over sliding windows. Knowl Inform Syst 15(2): 181–214

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Clara Pizzuti.

Additional information

Responsible editor: Charu Aggarwal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Forestiero, A., Pizzuti, C. & Spezzano, G. A single pass algorithm for clustering evolving data streams based on swarm intelligence. Data Min Knowl Disc 26, 1–26 (2013). https://doi.org/10.1007/s10618-011-0242-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-011-0242-x

Keywords

Navigation