skip to main content
10.1145/2939672.2939762acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Compact and Scalable Graph Neighborhood Sketching

Published: 13 August 2016 Publication History

Abstract

The all-distances sketch (ADS) has recently emerged as a promising paradigm of graph neighborhood sketching. An ADS is a probabilistic data structure that is defined for each vertex of a graph. ADSs facilitate accurate estimation of many useful indicators for network analysis with the guarantee of accuracy, and the ADSs for all the vertices in a graph can be computed in near-linear time. Because of these useful properties, ADS has attracted considerable attention. However, a critical drawback of ADS is its space requirement, which tends to be much larger than that of the graph itself. In the present study, we address this issue by designing a new graph sketching scheme, namely, sketch retrieval shortcuts (SRS). Although SRSs are more space-efficient than ADSs by an order of magnitude, an ADS of any vertex can be quickly retrieved from the SRSs. The retrieved ADSs can be used to estimate the aforementioned indicators in exactly the same manner as with plain ADSs, inheriting the same accuracy guarantee. Our experiments on real-world networks demonstrate the usefulness of SRSs as a practical back-end of large-scale graph data mining.

Supplementary Material

MP4 File (kdd2016_akiba_scalable_graph_01-acm.mp4)

References

[1]
T. Akiba, T. Hayashi, N. Nori, Y. Iwata, and Y. Yoshida. Efficient top-k shortest-path distance queries on large networks by pruned landmark labeling. In AAAI, pages 56--67, 2015.
[2]
T. Akiba, Y. Iwata, and Y. Yoshida. Fast exact shortest-path distance queries on large networks by pruned landmark labeling. In SIGMOD, pages 349--360, 2013.
[3]
T. Akiba, C. Sommer, and K. Kawarabayashi. Shortest-path queries for complex networks: exploiting low tree-width outside the core. In EDBT, pages 144--155, 2012.
[4]
L. Backstrom, P. Boldi, M. Rosa, J. Ugander, and S. Vigna. Four degrees of separation. In WebSci, pages 33--42, 2012.
[5]
P. Boldi, M. Rosa, M. Santini, and S. Vigna. Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In WWW, pages 587--596, 2011.
[6]
P. Boldi, M. Rosa, and S. Vigna. HyperANF: Approximating the neighbourhood function of very large graphs on a budget. In WWW, pages 625--634, 2011.
[7]
P. Boldi and S. Vigna. The WebGraph framework I: Compression techniques. In WWW, pages 595--601, 2004.
[8]
P. Boldi and S. Vigna. Axioms for centrality. Internet Mathematics, 10(3--4):222--262, 2014.
[9]
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In WWW, pages 107--117, 1998.
[10]
E. Buchnik and E. Cohen. Reverse ranking by graph structure: Model and scalable algorithms. CoRR, abs/1506.02386, 2015.
[11]
E. Cohen. Size-estimation framework with applications to transitive closure and reachability. J. Comput. Syst. Sci., 55(3):441--453, 1997.
[12]
E. Cohen. All-distances sketches, revisited: HIP estimators for massive graphs analysis. TKDE, 27(9):2320--2334, 2015.
[13]
E. Cohen, D. Delling, F. Fuchs, A. V. Goldberg, M. Goldszmidt, and R. F. Werneck. Scalable similarity estimation in social networks: closeness, node labels, and random edge lengths. In COSN, pages 131--142, 2013.
[14]
E. Cohen, D. Delling, T. Pajor, and R. F. Werneck. Sketch-based influence maximization and computation: Scaling up with guarantees. In CIKM, pages 629--638, 2014.
[15]
E. Cohen, D. Delling, T. Pajor, and R. F. Werneck. Timed influence: Computation and maximization. CoRR, abs/1410.6976, 2014.
[16]
E. Cohen and H. Kaplan. Summarizing data using bottom-k sketches. In PODC, pages 225--234, 2007.
[17]
N. Du, L. Song, M. Gomez-Rodriguez, and H. Zha. Scalable influence estimation in continuous-time diffusion networks. In NIPS, pages 3147--3155, 2013.
[18]
P. Flajolet and G. N. Martin. Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci., 31(2):182--209, Sept. 1985.
[19]
Y. Fujiwara, M. Nakatsuji, T. Yamamuro, H. Shiokawa, and M. Onizuka. Efficient personalized pagerank with accuracy assurance. In KDD, pages 15--23, 2012.
[20]
J. Kunegis. Konect: The koblenz network collection. In WWW Companion, pages 1343--1350, 2013.
[21]
J. Leskovec, J. Kleinberg, and C. Faloutsos. Graph evolution: Densification and shrinking diameters. TKDD, 1(1), 2007.
[22]
J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection, 2014.
[23]
D. Liben-Nowell and J. Kleinberg. The link prediction problem for social networks. In CIKM, pages 556--559, 2003.
[24]
S. Maniu and B. Cautis. Network-aware search in social tagging applications: Instance optimality versus efficiency. In CIKM, pages 939--948, 2013.
[25]
C. R. Palmer, P. B. Gibbons, and C. Faloutsos. ANF: A fast and scalable tool for data mining in massive graphs. In KDD, pages 81--90, 2002.
[26]
M. Thorup and U. Zwick. Approximate distance oracles. J. ACM, 52(1):1--24, 2005.
[27]
S. A. Yahia, M. Benedikt, L. V. S. Lakshmanan, and J. Stoyanovich. Efficient network aware search in collaborative tagging sites. PVLDB, 1(1):710--721, 2008.
[28]
A. D. Zhu, X. Xiao, S. Wang, and W. Lin. Efficient single-source shortest path and distance queries on large graphs. In KDD, pages 998--1006, 2013.

Cited By

View all
  • (2024)A Unified Framework for Mining Batch and Periodic Batch in Data StreamsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.339902436:11(5544-5561)Online publication date: Nov-2024
  • (2023)MicroscopeSketch: Accurate Sliding Estimation Using Adaptive ZoomingProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599432(2660-2671)Online publication date: 6-Aug-2023
  • (2023)HyperCalm Sketch: One-Pass Mining Periodic Batches in Data Streams2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00009(14-26)Online publication date: Apr-2023
  • Show More Cited By

Index Terms

  1. Compact and Scalable Graph Neighborhood Sketching

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
    August 2016
    2176 pages
    ISBN:9781450342322
    DOI:10.1145/2939672
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 August 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. all-distances sketches
    2. graphs
    3. min-hash sketches

    Qualifiers

    • Research-article

    Conference

    KDD '16
    Sponsor:

    Acceptance Rates

    KDD '16 Paper Acceptance Rate 66 of 1,115 submissions, 6%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)18
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 17 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A Unified Framework for Mining Batch and Periodic Batch in Data StreamsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.339902436:11(5544-5561)Online publication date: Nov-2024
    • (2023)MicroscopeSketch: Accurate Sliding Estimation Using Adaptive ZoomingProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599432(2660-2671)Online publication date: 6-Aug-2023
    • (2023)HyperCalm Sketch: One-Pass Mining Periodic Batches in Data Streams2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00009(14-26)Online publication date: Apr-2023
    • (2020)V-CombinerProceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392739(1-13)Online publication date: 29-Jun-2020
    • (2019)Privacy-Preserving Sketching for Online Social Network Data Publication2019 16th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON)10.1109/SAHCN.2019.8824823(1-9)Online publication date: 10-Jun-2019
    • (2017)Random-radius ball method for estimating closeness centralityProceedings of the Thirty-First AAAI Conference on Artificial Intelligence10.5555/3298239.3298259(125-131)Online publication date: 4-Feb-2017

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media