skip to main content
10.1145/2882903.2915223acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Graph Stream Summarization: From Big Bang to Big Crunch

Published: 14 June 2016 Publication History

Abstract

A graph stream, which refers to the graph with edges being updated sequentially in a form of a stream, has important applications in cyber security and social networks. Due to the sheer volume and highly dynamic nature of graph streams, the practical way of handling them is by summarization. Given a graph stream G, directed or undirected, the problem of graph stream summarization is to summarize G as SG with a much smaller (sublinear) space, linear construction time and constant maintenance cost for each edge update, such that SG allows many queries over G to be approximately conducted efficiently. The widely used practice of summarizing data streams is to treat each stream element independently by e.g., hash- or sample-based methods, without maintaining the connections (or relationships) between elements. Hence, existing methods can only solve ad-hoc problems, without supporting diversified and complicated analytics over graph streams. We present TCM, a novel generalized graph stream summary. Given an incoming edge, it summarizes both node and edge information in constant time. Consequently, the summary forms a graphical sketch where edges capture the connections inside elements, and nodes maintain relationships across elements. We discuss a wide range of supported queries and establish some error bounds. In addition, we experimentally show that TCM can effectively and efficiently support analytics over graph streams, which demonstrates its potential to start a new line of research and applications in graph stream management.

References

[1]
Tweet statistics. http://expandedramblings.com/index.php/march-2013-by-the-numbers-a-few-amazing-twitter-stats/10/.
[2]
Vitria. http://www.vitria.com/solutions/streaming-big-data-analytics/benefits/.
[3]
M. Adler and M. Mitzenmacher. Towards compressing web graphs. In Data Compression Conference, pages 203--212, 2001.
[4]
C. C. Aggarwal, editor. Data Classification: Algorithms and Applications. CRC Press, 2014.
[5]
N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci., 58(1):137--147, 1999.
[6]
J. D. Batson, D. A. Spielman, and N. Srivastava. Twice-ramanujan sparsifiers. In STOC, pages 255--262, 2009.
[7]
A. A. Benczür and D. R. Karger. Approximating s-t minimum cuts in O(n2) time. In STOC, pages 255--262, 1996.
[8]
A. A. Benczür and D. R. Karger. Randomized approximation schemes for cuts and flows in capacitated graphs. CoRR, cs.DS/0207078, 2002.
[9]
V. Braverman, R. Ostrovsky, and D. Vilenchik. How hard is counting triangles in the streaming model? In ICALP, pages 244--254, 2013.
[10]
A. Z. Broder and M. Mitzenmacher. Survey: Network applications of bloom filters: A survey. Internet Mathematics, 1(4):485--509, 2003.
[11]
D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-MAT: A recursive model for graph mining. In SDM, pages 442--446, 2004.
[12]
S. Choudhury, L. B. Holder, G. C. Jr., K. Agarwal, and J. Feo. A selectivity based approach to continuous pattern detection in streaming graphs. In EDBT, pages 157--168, 2015.
[13]
E. Cohen and H. Kaplan. Tighter estimation using bottom k sketches. PVLDB, 1(1):213--224, 2008.
[14]
G. Cormode and S. Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. J. Algorithms, 55(1):58--75, 2005.
[15]
G. Cormode and S. Muthukrishnan. Space efficient mining of multigraph streams. In PODS, pages 271--282, 2005.
[16]
M. Elkin. Streaming and fully dynamic centralized algorithms for constructing and maintaining sparse spanners. ACM Transactions on Algorithms, 7(2):20, 2011.
[17]
R. Fagin, R. Kumar, and D. Sivakumar. Comparing top k lists. In SODA, pages 28--36, 2003.
[18]
W. Fan, J. Li, X. Wang, and Y. Wu. Query preserving graph compression. In SIGMOD, pages 157--168, 2012.
[19]
J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang. On graph problems in a semi-streaming model. Theor. Comput. Sci., 348(2-3), 2005.
[20]
J. Gao, C. Zhou, J. Zhou, and J. X. Yu. Continuous pattern detection over billion-edge graph using distributed framework. In ICDE, pages 556--567, 2014.
[21]
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In OSDI, pages 17--30, 2012.
[22]
J. E. Gonzalez, R. S. Xin, A. Dave, D. Crankshaw, M. J. Franklin, and I. Stoica. Graphx: Graph processing in a distributed dataflow framework. In OSDI, pages 599--613, 2014.
[23]
S. Guha and A. McGregor. Graph synopses, sketches, and streams: A survey. PVLDB, 5(12):2030--2031, 2012.
[24]
A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice-Hall, 1988.
[25]
K. Jarvelin and J. Kekalainen. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst., 20(4), 2002.
[26]
J. A. Kelner and A. Levin. Spectral sparsification in the semi-streaming setting. Theory Comput. Syst., 53(2):243--262, 2013.
[27]
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Distributed graphlab: A framework for machine learning in the cloud. PVLDB, 5(8):716--727, 2012.
[28]
G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD, pages 135--146, 2010.
[29]
G. S. Manku and R. Motwani. Approximate frequency counts over data streams. In VLDB, pages 346--357, 2002.
[30]
A. McGregor. Graph stream algorithms: a survey. SIGMOD Record, 43(1):9--20, 2014.
[31]
K. Mirylenka, G. Cormode, T. Palpanas, and D. Srivastava. Conditional heavy hitters: detecting interesting correlations in data streams. VLDB J., 24(3):395--414, 2015.
[32]
S. Raghavan and H. Garcia-Molina. Representing web graphs. In ICDE, pages 405--416, 2003.
[33]
C. Song, T. Ge, C. X. Chen, and J. Wang. Event pattern matching over graph streams. PVLDB, 8(4):413--424, 2014.
[34]
D. A. Spielman and N. Srivastava. Graph sparsification by effective resistances. In STOC, pages 563--568, 2008.
[35]
T. Suel and J. Yuan. Compressing the graph structure of the web. In Data Compression Conference, pages 213--222, 2001.
[36]
R. E. Tarjan. Data structures and network algorithms. In SIAM. 1983.
[37]
C. E. Tsourakakis, U. Kang, G. L. Miller, and C. Faloutsos. DOULION: counting triangles in massive graphs with a coin. In SIGKDD, pages 837--846, 2009.
[38]
C. Wang and L. Chen. Continuous subgraph pattern search over graph streams. In ICDE, pages 393--404, 2009.
[39]
P. Zhao, C. C. Aggarwal, and M. Wang. gSketch: On query estimation in graph streams. PVLDB, 5(3):193--204, 2011.

Cited By

View all
  • (2024)Anomaly Detection over Streaming Graphs with Finger-Based Higher-Order Graph SketchMathematics10.3390/math1219309212:19(3092)Online publication date: 2-Oct-2024
  • (2024)Improving Graph Compression for Efficient Resource-Constrained Graph AnalyticsProceedings of the VLDB Endowment10.14778/3665844.366585217:9(2212-2226)Online publication date: 1-May-2024
  • (2024)A Two-stage Coarsening Method for a Streaming Graph with Preserving Key FeaturesProceedings of the 2024 International Conference on Generative Artificial Intelligence and Information Security10.1145/3665348.3665392(253-260)Online publication date: 10-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data
June 2016
2300 pages
ISBN:9781450335317
DOI:10.1145/2882903
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data streams
  2. graph streams
  3. sketch
  4. summarization

Qualifiers

  • Research-article

Conference

SIGMOD/PODS'16
Sponsor:
SIGMOD/PODS'16: International Conference on Management of Data
June 26 - July 1, 2016
California, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)67
  • Downloads (Last 6 weeks)6
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Anomaly Detection over Streaming Graphs with Finger-Based Higher-Order Graph SketchMathematics10.3390/math1219309212:19(3092)Online publication date: 2-Oct-2024
  • (2024)Improving Graph Compression for Efficient Resource-Constrained Graph AnalyticsProceedings of the VLDB Endowment10.14778/3665844.366585217:9(2212-2226)Online publication date: 1-May-2024
  • (2024)A Two-stage Coarsening Method for a Streaming Graph with Preserving Key FeaturesProceedings of the 2024 International Conference on Generative Artificial Intelligence and Information Security10.1145/3665348.3665392(253-260)Online publication date: 10-May-2024
  • (2024)Play like a Vertex: A Stackelberg Game Approach for Streaming Graph PartitioningProceedings of the ACM on Management of Data10.1145/36549652:3(1-27)Online publication date: 30-May-2024
  • (2024)Graph Summarization: Compactness Meets EfficiencyProceedings of the ACM on Management of Data10.1145/36549432:3(1-26)Online publication date: 30-May-2024
  • (2024) SsAG: Summarization and Sparsification of Attributed GraphsACM Transactions on Knowledge Discovery from Data10.1145/365161918:6(1-22)Online publication date: 12-Apr-2024
  • (2024)Node Embedding Preserving Graph SummarizationACM Transactions on Knowledge Discovery from Data10.1145/364950518:6(1-19)Online publication date: 12-Apr-2024
  • (2024)Learning to Sketch: A Neural Approach to Item Frequency Estimation in Streaming DataIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.338858946:11(7136-7153)Online publication date: Nov-2024
  • (2024)Streaming Local Community Detection Through Approximate ConductanceIEEE Transactions on Big Data10.1109/TBDATA.2023.331025110:1(12-22)Online publication date: Feb-2024
  • (2024)$\mathcal{FR}_{B}$-Sketch: A Graph Stream Summarization Model Based on Pattern and Rank2024 5th International Conference on Electronic Communication and Artificial Intelligence (ICECAI)10.1109/ICECAI62591.2024.10675249(436-440)Online publication date: 31-May-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media