The G* graph database: efficiently managing large distributed dynamic graphs

Labouseur, Alan G.; Birnbaum, Jeremy; Olsen, Paul W.; Spillane, Sean R.; Vijayan, Jayadevan; Hwang, Jeong-Hyon; Han, Wook-Shin

doi:10.1007/s10619-014-7140-3

The G* graph database: efficiently managing large distributed dynamic graphs

Published: 13 March 2014

Volume 33, pages 479–514, (2015)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Alan G. Labouseur¹,
Jeremy Birnbaum¹,
Paul W. Olsen Jr.¹,
Sean R. Spillane¹,
Jayadevan Vijayan¹,
Jeong-Hyon Hwang¹ &
…
Wook-Shin Han^2,3

2237 Accesses
42 Citations
9 Altmetric
Explore all metrics

Abstract

From sensor networks to transportation infrastructure to social networks, we are awash in data. Many of these real-world networks tend to be large (“big data”) and dynamic, evolving over time. Their evolution can be modeled as a series of graphs. Traditional systems that store and analyze one graph at a time cannot effectively handle the complexity and subtlety inherent in dynamic graphs. Modern analytics require systems capable of storing and processing series of graphs. We present such a system. G* compresses dynamic graph data based on commonalities among the graphs in the series for deduplicated storage on multiple servers. In addition to the obvious space-saving advantage, large-scale graph processing tends to be I/O bound, so faster reads from and writes to stable storage enable faster results. Unlike traditional database and graph processing systems, G* executes complex queries on large graphs using distributed operators to process graph data in parallel. It speeds up queries on multiple graphs by processing graph commonalities only once and sharing the results across relevant graphs. This architecture not only provides scalability, but since G* is not limited to processing only what is available in RAM, its analysis capabilities are far greater than other systems which are limited to what they can hold in memory. This paper presents G*’s design and implementation principles along with evaluation results that document its unique benefits over traditional graph processing systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sprouter: Dynamic Graph Processing over Data Streams at Scale

Large scale graph processing systems: survey and an experimental evaluation

Article 24 July 2015

Distributed temporal graph analytics with GRADOOP

Article Open access 19 May 2021

Notes

These systems cannot readily take advantage of commonalities among graphs and thereby suffer high space overhead. For example, one may consider using a relation to store edges of a series of graphs. In this case, for an edge contained in 100 snapshots, there will be 100 tuples for that edge, each differentiated by snapshot ID. This incurs high space overhead compared to our system, which supports deduplicated storage as described throughout this paper.
In this paper, we focus on managing graphs that correspond to periodic snapshots of an evolving network. Logging the input data allows G* to reconstruct graphs as of any points in the past by using periodic snapshots and log data. This feature is not further discussed in this paper.
The current G* implementation assigns each vertex to a server based on the hash value of the vertex ID. We are developing data distribution techniques that can reduce the edges whose end points are assigned to different servers.
As mentioned in the cost analysis of the put(v, d, g) method, updating the CGI for a version of vertex v also requires a lookup via maps(v).

References

Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D.J., Rasin, A., Silberschatz, A.: HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc. VLDB Endow. (PVLDB) 2(1), 922–933 (2009)
Article Google Scholar
Adler, M., Mitzenmacher, M.: Towards compressing web graphs. In: Proceedings of the 2001 Data Compression Conference (DCC), pp. 203–212 (2001)
Alashqur, A.M., Su, S., Lam, H.: OQL: a query language for manipulating object-oriented databases. In: Proceedings of the 15th International Conference on Very Large Data Bases (VLDB), pp. 433–442 (1989)
Apache Giraph: http://incubator.apache.org/giraph/. Accessed 23 Feb 2014
Barbay, J., He, M., Munro, I., Rao, S.: Succinct indexes for strings, binary relations and multi-labeled trees. In: Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 680–689 (2007)
Bogdanov, P., Mongiovì, M., Singh, A.K.: Mining heavy subgraphs in time-evolving networks. In: Proceedings of the 11th IEEE International Conference on Data Mining (ICDM), pp. 81–90 (2011)
Boldi, P., Vigna, S.: The webgraph framework I: compression techniques. In: Proceedings of the 13th International Conference on World Wide Web (WWW), pp. 595–602 (2004)
Bui-Xuan, B.M., Ferreira, A., Jarry, A.: Computing shortest, fastest, and foremost journeys in dynamic networks. Int. J. Found. Comput. Sci. 14(2), 267–267 (2003)
Article MathSciNet Google Scholar
Casteigts, A., Flocchini, P., Quattrociocchi, W., Santoro, N.: Time-varying graphs and dynamic networks. In: Proceedings of the 10th International Conference on Ad-hoc, Mobile, and Wireless Networks (ADHOC-NOW), pp. 346–359 (2011)
Chan, A., Dehne, F.K.H.A., Taylor, R.: CGMGRAPH/CGMLIB: implementing and testing CGM graph algorithms on PC clusters and shared memory machines. Int. J. High Perform. Comput. Appl. (IJHPCA) 19(1), 81–97 (2005)
Article Google Scholar
Chen, R., Weng, X., He, B., Yang, M.: Large graph processing in the cloud. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 1123–1126 (2010)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI), pp. 137–150 (2004)
G* Operator Reference Guide: http://www.cs.albany.edu/~gstar/operator-reference. Accessed 23 Feb 2014
Gregor, D., Lumsdaine, A.: The parallel BGL: a generic library for distributed graph computations. In: Proceedings of the 4th Workshop on Parallel/High-Performance Object-Oriented Scientific Computing (POOSC) (2005)
Han, W.S., Lee, J., Pham, M.D., Yu, J.X.: iGraph: a framework for comparisons of disk-based graph indexing techniques. Proc. VLDB Endow. (PVLDB) 3(1), 449–459 (2010)
Article Google Scholar
He, H., Singh, A.: Graphs-at-a-time: query language and access methods for graph databases. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 405–418 (2008)
Java Remote Method Invocation (RMI): http://download.oracle.com/javase/tutorial/rmi/index.html. Accessed 23 Feb 2014
Jin, R., Ruan, N., Dey, S., Yu, J.X.: SCARAB: scaling reachability computation on large graphs. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 169–180 (2012)
Kang, U., Tsourakakis, C., Faloutsos, C.: PEGASUS: a peta-scale graph mining system. In: Proceedings of the 9th IEEE International Conference on Data Mining (ICDM), pp. 229–238 (2009)
Kang, U., Tsourakakis, C., Appel, A.P., Faloutsos, C., Leskovec, J.: HADI: mining radii of large graphs. ACM Trans. Knowl. Discov. Data (TKDD) 5(2), 8.1–8.24 (2011)
Google Scholar
Kossinets, G., Watts, D.: Empirical analysis of an evolving social network. Science 311(5757), 88–90 (2006)
Article MathSciNet Google Scholar
Kuhlman, C., Kumar, A., Marathe, M., Ravi, S.S., Rosenkrantz, D.: Finding critical nodes for inhibiting diffusion of complex contagions in social networks. In: Proceedings of the European Conference on European Conference on Machine Learning and Principles of Knowledge Discovery in Databases (ECML PKDD), pp. 111–127 (2010)
Kumar, R., Novak, J., Tomkins, A.: Structure and evolution of online social networks. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 611–617 (2006)
Kyrola, A., Blelloch, G., Guestrin, C.: GraphChi: large-scale graph computation on just a PC. In: Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation (USENIX), pp. 31–46 (2012)
Lahiri, M., Berger-Wolf, T.Y.: Structure prediction in temporal networks using frequent subgraphs. In: Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 35–42 (2007)
Leskovec, J., Backstrom, L., Kumar, R., Tomkins, A.: Microscopic evolution of social networks. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 462–470 (2008)
Leskovec, J., Kleinberg, J.M., Faloutsos, C.: Graphs over Time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 177–187 (2005)
Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: GraphLab: a new framework for parallel machine learning. In: Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), pp. 340–349 (2010)
Malewicz, G., Austern, M., Bik, A., Dehnert, J., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 135–146 (2010)
Navlakha, S., Rastogi, R., Shrivastava, N.: Graph summarization with bounded error. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 419–432 (2008)
Neely, M.J., Modiano, E., Rohrs, C.E.: Dynamic power allocation and routing for time varying wireless networks. In: Proceedings of the 22nd Annual Joint Conference of the IEEE Computer and Communications IEEE Societies (INFOCOM) (2003)
Neo4j: http://neo4j.org/. Accessed 23 Feb 2014
Nicosia, V., Tang, J., Musolesi, M., Russo, G., Mascolo, C., Latora, V.: Components in time-varying graphs. CoRR abs/1106.2134 (2011)
Pan, R.K., Saramäki, J.: Path lengths, correlations, and centrality in temporal networks. CoRR abs/1101.5913 (2011)
Parr, T.: The Definitive ANTLR Reference: Building Domain-Specific Languages. Pragmatic Bookshelf, Raleigh (2008)
MATH Google Scholar
Phoebus: https://github.com/xslogic/phoebus. Accessed 23 Feb 2014
PostgreSQL 9.0: http://www.postgresql.org/. Accessed 23 Feb 2014
Ren, C., Lo, E., Kao, B., Zhu, X., Cheng, R.: On querying historical evolving graph sequences. Proc. VLDB Endow. (PVLDB) 4(11), 726–737 (2011)
MATH Google Scholar
Santoro, N., Quattrociocchi, W., Flocchini, P., Casteigts, A., Amblard, F.: Time-varying graphs and social network analysis: temporal indicators and metrics. CoRR abs/1102.0629 (2011)
Shun, J., Blelloch, G.: Ligra: a lightweight graph processing framework for shared memory. In: Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pp. 135–146 (2013)
Spillane, S., Birnbaum, J., Bokser, D., Kemp, D., Labouseur, A., Olsen Jr., P., Vijayan, J., Hwang, J.H.: A demonstration of the G* graph database system. In: Proceedings of the 29th International Conference on Data Engineering (ICDE), pp. 1356–1359 (2013)
Stanford Large Network Dataset Collection: http://snap.stanford.edu/data/. Accessed 23 Feb 2014
Suel, T., Yuan, J.: Compressing the graph structure of the web. In: Proceedings of the 2001 Data Compression Conference (DCC), pp. 213–222 (2001)
Tan, C., Tang, J., Sun, J., Lin, Q., Wang, F.: Social action tracking via noise tolerant time-varying factor graphs. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 1049–1058 (2010)
Tang, J., Musolesi, M., Mascolo, C., Latora, V.: Temporal distance metrics for social network analysis. In: Proceedings of the 2nd ACM Workshop on Online Social Networks (WOSN), pp. 31–36 (2009)
Tang, Z., Lin, H., Li, K., Han, W., Chen, W.: Acolyte: an in-memory social network query system. In: Proceedings of the 13th International Conference on Web Information Systems Engineering (WISE), pp. 755–763 (2012)
The Angrapa package: http://people.apache.org/~edwardyoon/site/hama_graph_tutorial.html. Accessed 23 Feb 2014
Trinity: http://research.microsoft.com/en-us/projects/trinity/. Accessed 23 Feb 2014
Twitter Streaming API: https://dev.twitter.com/docs/streaming-apis/streams/public. Accessed 23 Feb 2014
Vicknair, C., Macias, M., Zhao, Z., Nan, X., Chen, Y., Wilkins, D.: A comparison of a graph database and a relational database: a data provenance perspective. In: Proceedings of the 48th Annual Southeast Regional Conference (SE), pp. 42.1–42.6 (2010)
Yahoo! Network Flows Data: http://webscope.sandbox.yahoo.com/catalog.php?datatype=g. Accessed 23 Feb 2014
Zhao, P., Han, J.: On graph query optimization in large networks. Proc. VLDB Endow. (PVLDB) 3(1), 340–351 (2010)
Article Google Scholar

Download references

Acknowledgments

This research was supported by NSF CAREER award IIS-1149372 and also supported by the MSIP (Ministry of Science, ICT and Future Planning), Korea, under the “IT Consilience Creative Program” (NIPA-2013-H0203-13-1001) supervised by the NIPA (National IT Industry Promotion Agency).

Author information

Authors and Affiliations

Department of Computer Science, State University of New York, Albany, NY, USA
Alan G. Labouseur, Jeremy Birnbaum, Paul W. Olsen Jr., Sean R. Spillane, Jayadevan Vijayan & Jeong-Hyon Hwang
Department of Creative IT Engineering, Pohang University of Science and Technology, Pohang, Korea
Wook-Shin Han
Department of Computer Science and Engineering, Pohang University of Science and Technology, Pohang, Korea
Wook-Shin Han

Authors

Alan G. Labouseur
View author publications
You can also search for this author inPubMed Google Scholar
Jeremy Birnbaum
View author publications
You can also search for this author inPubMed Google Scholar
Paul W. Olsen Jr.
View author publications
You can also search for this author inPubMed Google Scholar
Sean R. Spillane
View author publications
You can also search for this author inPubMed Google Scholar
Jayadevan Vijayan
View author publications
You can also search for this author inPubMed Google Scholar
Jeong-Hyon Hwang
View author publications
You can also search for this author inPubMed Google Scholar
Wook-Shin Han
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Alan G. Labouseur.

Additional information

Communicated by Haixun Wang and Jeffrey Xu Yu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Labouseur, A.G., Birnbaum, J., Olsen, P.W. et al. The G* graph database: efficiently managing large distributed dynamic graphs. Distrib Parallel Databases 33, 479–514 (2015). https://doi.org/10.1007/s10619-014-7140-3

Download citation

Published: 13 March 2014
Issue Date: December 2015
DOI: https://doi.org/10.1007/s10619-014-7140-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The G* graph database: efficiently managing large distributed dynamic graphs

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Sprouter: Dynamic Graph Processing over Data Streams at Scale

Large scale graph processing systems: survey and an experimental evaluation

Distributed temporal graph analytics with GRADOOP

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now