skip to main content
10.1145/2949689.2949696acmotherconferencesArticle/Chapter ViewAbstractPublication PagesssdbmConference Proceedingsconference-collections
research-article

Framework for real-time clustering over sliding windows

Published: 18 July 2016 Publication History

Abstract

Clustering queries over sliding windows require maintaining cluster memberships that change as windows slide. To address this, the Generic 2-phase Continuous Summarization framework (G2CS) utilizes a generation based window maintenance approach where windows are maintained over different time intervals. It provides algorithm independent and efficient sliding mechanisms for clustering queries where the clustering algorithms are defined in terms of queries over cluster data represented as temporal tables. A particular challenge for real-time detection of a high number of fastly evolving clusters is efficiently supporting smooth re-clustering in real-time, i.e. to minimize the sliding time with increasing window size and decreasing strides. To efficiently support such re-clustering for clustering algorithms where deletion of expired data is not supported, e.g. BIRCH, G2CS includes a novel window maintenance mechanism called Sliding Binary Merge (SBM), which maintains several generations of intermediate window instances and does not require decremental cluster maintenance. To improve real-time sliding performance, G2CS uses generation-based multi-dimensional indexing. Extensive performance evaluation on both synthetic and real data shows that G2CS scales substantially better than related approaches.

References

[1]
Lukasz Golab and Tamer M Özsu, "Issues in data stream management," in SIGMOD Record, 2003, pp. 5--14.
[2]
L. Jin, D. Maier, K. Tufte, V. Papadimos, and P. A. Tucker, "Semantics and evaluation techniques for window aggregates in data streams," in SIGMOD conf., Baltimore, Maryland, 2005.
[3]
Carlo Zaniolo and Haixun Wang, "Logic-based user-defined aggregates for the next generation of database systems," in The Logic Programming Paradigm.: Springer Berlin Heidelberg, 1999.
[4]
Z. Rui, N. Koudas, B. C. Ooi, and D. Srivastava, "Multiple aggregations over data streams," in SIGMOD conf., Baltimore, Maryland, 2005.
[5]
Krishnamurthy S., C. Wu, and M. Franklin, "On-the-fly sharing for streamed aggregation," in SIGMOD conf., Chicago, Illinois, 2006.
[6]
G. Shenoda, M. A. Sharaf, P. K. Chrysanthis, and A. Labrinidis, "Optimized processing of multiple aggregate continuous queries," in Proceedings of the 20th ACM international conference on Information and knowledge management, Glasgow, 2011.
[7]
G. Shenoda, M. A. Sharaf, P. K. Chrysanthis, and A. Labrinidis, "Three-level processing of multiple aggregate continuous queries," in Data Engineering (ICDE), 2012 IEEE 28th International Conference on, Hannover, 2012.
[8]
C. Cranor, T. Johnson, O. Spataschek, and V. Shkapenyuk, "Gigascope: a stream database for network applications," in SIGMOD conf., New York, 2003, pp. 647--651.
[9]
Kanat Tangwongsan, Martin Hirzel, Scott Schneider, and Kun-Lung Wu, "General incremental sliding-window aggregation," Proceedings of the VLDB Endowment, vol. 8, pp. 702--713, 2015.
[10]
S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan, "Clustering data streams," in Proceedings of Foundations of Computer Science conference, Redondo Beach, CA, 2000, pp. 359--366.
[11]
M. Ester, H-P. Kriegel, J. Sander, M. Wimmer, and X. Xu, "Incremental clustering for mining in a data warehousing environment," in VLDB conf., New York, 1998, pp. 323--333.
[12]
Di Yang, E. A. Rundensteiner, and M. O. Ward, "Neighborbased pattern detection for windows over streaming data.," in EDBT conf., Saint Petersburg, 2009, pp. 229--540.
[13]
T. Zhang, R. Ramakrishnan, and M. Livny, "BIRCH: an efficient data clustering method for very large databases," in SIGMOD conf., Montreal, 1996., pp. 103--114.
[14]
B. Babcock, D. Mayur, M. Rajeev, and L. O'Callaghan, "Maintaining variance and k-medians over data stream windows," in SIGMOD conf., San Diego, 2003, pp. 234--243.
[15]
Tian Zhang, Raghu Ramakrishnan, and Miron Livny, "BIRCH: an efficient data clustering method for very large databases," in Proceedings of the 1996 ACM SIGMOD international conference on Management of data, 1996, pp. 103--114.
[16]
Stefan Berchtold, Keim A Daniel, and Hans-Peter Kriegel, "The X-tree: An Index Structure for High-Dimensional Data," in Proc. VLDB Conf., 1996, pp. 28--39.
[17]
Thanh Truong and Tore Risch, "Transparent inclusion, utilization, and validation of main memory domain indexes," in 27th International Conference on Scientific and Statistical Database Management, San Diego, 2015.
[18]
Roberto Perdisci. (2015, November) JBIRCH - BIRCH clustering implementation in Java. {Online}. http://roberto.perdisci.com/projects/jbirch
[19]
Jennifer Widom and Jun Yang, "Incremental Computation and Maintenance of Temporal Aggregates," in Proceedings of the 17th International Conference on Data Engineering, 2001, pp. 51--60.
[20]
Kanat Tangwongsan, Martin Hirzel, and Scott Schneider, "Constant-Time Sliding Window Aggregation," IBM, IBM Research Report RC25574 (WAT1511-030), 2015.
[21]
Pramod Bhatotia, Junqueira P Flavio, Acar A Umut, and Rodrigo Rodrigues, "Slider: Incremental Sliding Window Analytics," in Middleware'14, Bordeaux, France., 2014, pp. 61--72.
[22]
Fazli Can, "Incremental clustering for dynamic information processing," ACM Transactions on Information Systems (TOIS), vol. 11, no. 2, pp. 143--164, 1993.
[23]
Douglas H Fisher, "Knowledge acquisition via incremental conceptual clustering," Machine learning, vol. 2, pp. 139--172, 1987.
[24]
Di Yang, Elke A Rundensteiner, and Matthew O Ward, "Summarization and matching of density-based clusters in streaming environments," in Proceedings of the VLDB Endowment, 2011, pp. 121--132.
[25]
Charu C Aggarwal, Jiawei Han, Jianyong Wang, and Philip S Yu, "A framework for clustering evolving data streams," in VLDB '03 Proceedings of the 29th international conference on Very large data bases, 2003, pp. 81--92.
[26]
Charu C Aggarwal, Jiawei Han, Jianyong Wang, and Philip S Yu, "A framework for projected clustering of high dimensional data streams," in VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases, 2004, pp. 852--863.
[27]
Charu C Aggarwal and Philip S Yu, "A framework for clustering uncertain data streams," in Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on, 2008, pp. 150--159.
[28]
E. Zeitler and T. Risch, "Massive scale-out of expensive continuous queries," in VLDB conf., Seattle, 2011, pp. 1181--1188.

Cited By

View all
  • (2024)Ocean: Online Clustering and Evolution Analysis for Dynamic Streaming Data2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00343(4504-4517)Online publication date: 13-May-2024
  • (2022)clusTransition: An R package for monitoring transition in cluster solutions of temporal datasetsPLOS ONE10.1371/journal.pone.027814617:12(e0278146)Online publication date: 15-Dec-2022
  • (2021)Improvement of CluStream Algorithm Using Sliding Window for the Clustering of Data Streams2021 11th International Conference on Computer Engineering and Knowledge (ICCKE)10.1109/ICCKE54056.2021.9721505(434-440)Online publication date: 28-Oct-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SSDBM '16: Proceedings of the 28th International Conference on Scientific and Statistical Database Management
July 2016
290 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Clustering
  2. Framework
  3. Sliding windows

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SSDBM '16

Acceptance Rates

Overall Acceptance Rate 56 of 146 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Ocean: Online Clustering and Evolution Analysis for Dynamic Streaming Data2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00343(4504-4517)Online publication date: 13-May-2024
  • (2022)clusTransition: An R package for monitoring transition in cluster solutions of temporal datasetsPLOS ONE10.1371/journal.pone.027814617:12(e0278146)Online publication date: 15-Dec-2022
  • (2021)Improvement of CluStream Algorithm Using Sliding Window for the Clustering of Data Streams2021 11th International Conference on Computer Engineering and Knowledge (ICCKE)10.1109/ICCKE54056.2021.9721505(434-440)Online publication date: 28-Oct-2021
  • (2021)Applications of monitoring and tracing the evolution of clustering solutions in dynamic datasetsJournal of Applied Statistics10.1080/02664763.2021.2008882(1-19)Online publication date: 7-Dec-2021
  • (2021)A Hybrid Sliding Window Based Method for Stream ClassificationKnowledge Discovery, Knowledge Engineering and Knowledge Management10.1007/978-3-030-66196-0_5(94-107)Online publication date: 14-Jan-2021
  • (2020)A Survey of Real-Time Big Data Processing AlgorithmsReliability and Risk Assessment in Engineering10.1007/978-981-15-3746-2_1(3-10)Online publication date: 9-May-2020
  • (2020)Tools and Techniques for Streaming DataOntology‐Based Information Retrieval for Healthcare Systems10.1002/9781119641391.ch15(313-330)Online publication date: 28-Jul-2020
  • (2019)ImpSlidingWindow: Kayan Pencere Tabanlı Akan Veri Özetleme Yönteminin Performansını Arttırmaya Yönelik Yeni Bir ModelEuropean Journal of Science and Technology10.31590/ejosat.638096(292-301)Online publication date: 31-Oct-2019
  • (2019)K-boyutlu ağaç ve uyarlanabilir yarıçap (KD-AR Stream) tabanlı gerçek zamanlı akan veri kümelemeKd-tree and adaptive radius (KD-AR Stream) based real-time data stream clusteringGazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi10.17341/gazimmfd.46722635:1(337-354)Online publication date: 25-Oct-2019
  • (2018)Efficient Data Stream Clustering With Sliding Windows Based on Locality-Sensitive HashingIEEE Access10.1109/ACCESS.2018.28771386(63757-63776)Online publication date: 2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media