Skip to main content

Sketching Data Structures for Massive Graph Problems

  • Conference paper
  • First Online:
  • 530 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11470))

Abstract

In this work, we explore the application of sketching data structures to solve problems in graphs that do not fit entirely in memory. These structures allow compact representations of data, admitting some probability of failure. We aim at the implicit representation and dynamic connectivity problems. Our contributions include two new probabilistic implicit representations, one that uses Bloom filters and allows representing sparse graphs with O(|E|) bits, and another that uses MinHash sketches and represents trees with O(|V|) bits. We also describe a variant of an \(\ell _0\)-sampling sketch that allows proving a tighter upper bound on the failure probability of sampling.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Ahn, K.J., Guha, S., McGregor, A.: Analyzing graph structure via linear measurements. In: Proceedings of SODA 2012, pp. 459–467 (2012)

    Google Scholar 

  2. Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)

    Article  Google Scholar 

  3. Broder, A.Z.: On the resemblance and containment of documents. In: Proceedings of SEQUENCES 1997, pp. 21–29 (1997)

    Google Scholar 

  4. Cormode, G., Firmani, D.: A unifying framework for \(\ell _0\)-sampling algorithms. Distrib. Parallel Databases 32(3), 315–335 (2014)

    Article  Google Scholar 

  5. Cormode, G., Muthukrishnan, S., Rozenbaum, I.: Summarizing and mining inverse distributions on data streams via dynamic inverse sampling. In: Proceedings of VLDB 2005, pp. 25–36 (2005)

    Google Scholar 

  6. Eppstein, D., Galil, Z., Italiano, G.F.: Dynamic graph algorithms (chap. 8). In: Atallah, M.J. (ed.) Algorithms and Theory of Computation Handbook. CRC Press, Boca Raton (1999)

    Google Scholar 

  7. Flajolet, P., Fusy, É., Gandouet, O., Meunier, F.: HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. In: Proceedings of AofA 2007, pp. 127–146 (2007)

    Google Scholar 

  8. Jowhari, H., Sağlam, M., Tardos, G.: Tight bounds for \(L_p\) samplers, finding duplicates in streams, and related problems. In: Proceedings of PODS 2011, pp. 49–58 (2011)

    Google Scholar 

  9. Kannan, S., Naor, M., Rudich, S.: Implicit representation of graphs. SIAM J. Discret. Math. 5(4), 596–603 (1992)

    Article  MathSciNet  Google Scholar 

  10. Li, P., König, A.C.: b-Bit minwise hashing. In: Proceedings of WWW 2010, pp. 671–680 (2010)

    Google Scholar 

  11. Lopes, J.P.A.: Probabilistic data structures applied to implicit graph representation. Master’s thesis, State University of Rio de Janeiro (2017, in Portuguese)

    Google Scholar 

  12. Lopes, J.P.A., Oliveira, F.S., Pinto, P.E.D.: Estimating the intersection cardinality of sets using MinHash and HyperLogLog. In: Proceedings of CNMAC 2016, pp. 010077- 1–2 (2017, in Portuguese)

    Google Scholar 

  13. McGregor, A.: Graph stream algorithms: a survey. ACM SIGMOD Rec. 43(1), 9–20 (2014)

    Article  MathSciNet  Google Scholar 

  14. Monemizadeh, M., Woodruff, D.P.: 1-pass relative-error \(L_p\)-sampling with applications. In: Proceedings of SODA 2010, pp. 1143–1160 (2010)

    Google Scholar 

  15. Muller, J.H.: Local structure in graph classes. Ph.D. thesis, Georgia Institute of Technology (1988)

    Google Scholar 

  16. Spinrad, J.P.: Efficient Graph Representations. American Mathematical Society, Providence (2003)

    Book  Google Scholar 

Download references

Acknowledgements

The authors acknowledge partial financial support from CNPq, CAPES, and a FAPERJ BBP grant.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juan P. A. Lopes .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lopes, J.P.A., Oliveira, F.S., Pinto, P.E.D., Barbosa, V.C. (2019). Sketching Data Structures for Massive Graph Problems. In: Gadepally, V., Mattson, T., Stonebraker, M., Wang, F., Luo, G., Teodoro, G. (eds) Heterogeneous Data Management, Polystores, and Analytics for Healthcare. DMAH Poly 2018 2018. Lecture Notes in Computer Science(), vol 11470. Springer, Cham. https://doi.org/10.1007/978-3-030-14177-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-14177-6_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-14176-9

  • Online ISBN: 978-3-030-14177-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics