Abstract
Over the past few years, there has been a rapid increase of data originating from evolving networks such as social networks, sensor networks and others. A major challenge that arises when handling such networks and their respective graphs is the ability to issue a historical query on their data, that is, a query that is concerned with the state of the graph at previous time instances. While there has been a number of works that index the historical data in a time-centric manner (i.e. according to the time instance an update event occurs), in this work, we focus on the less-explored vertex-centric storage approach (i.e. according to the entity in which an update event occurs). We demonstrate that the design choices for a vertex-centric model are not trivial, by proposing two different modelling and storage models that leverage NoSQL technology and investigating their tradeoffs. More specifically, we experimentally evaluate the two models and show that under certain cases, their relative performance can differ by several times. Finally, we provide evidence that simple baseline and non-NoSQL solutions are slower by up to an order of magnitude.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Source code available at https://github.com/hinodeauthors/hinode.
For reasons of clarity edge labels, vertex colors and edge weights are not shown.
The queries executed were Average Vertex Degree, Degree Distribution and One-Hop Neighborhood Retrieval on the last snapshot of the sequence—see Sect. 5.1.
Source code available at https://github.com/akosmato/HinodeNoSQL.
Due to difficulties in assigning some specific edges to a particular snapshot we removed \(0.4\%\) of the total edges in the “hep-th” and “hep-ph” datasets and \(0.04\%\) edges of the “USPatents” dataset.
As an example, a sequence of 100 snapshots that gets indexed every 20 snapshots would be comprised of five smaller ST indices whereas a vertex that exists in the first 75 snapshots would only be present in the first 4 smaller ST indices.
e.g. A performance ratio of 2 corresponds to MT requiring half the execution time of ST.
Measured through the “nodetool” utility.
References
Akiba T, Iwata Y, Yoshida Y (2014) Dynamic and historical shortest-path distance queries on large evolving networks by pruned landmark labeling. In: 23rd international world wide web conference, WWW’14, pp 237–248
Apache Giraph. http://giraph.apache.org/. Accessed 12 July 2018
Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512
Gonzalez JE, Xin RS, Dave A, Crankshaw D, Franklin MJ, Stoica I (2014) Graphx: graph processing in a distributed dataflow framework. OSDI 14:599–613
Huo W, Tsotras VJ (2014) Efficient temporal shortest path queries on evolving social graphs. In: Conference on scientific and statistical database management, SSDBM ’14, pp 38:1–38:4
Khurana U, Deshpande A (2013) Efficient snapshot retrieval over historical graph data. In: 29th IEEE international conference on data engineering, ICDE 2013, Brisbane, April 8–12, pp 997–1008
Khurana U, Deshpande A (2016) Storing and analyzing historical graph data at scale. In: Proceedings of the 19th international conference on extending database technology, EDBT 2016, pp 65–76
Kosmatopoulos A, Giannakopoulou K, Papadopoulos AN, Tsichlas K (2016) An overview of methods for handling evolving graph sequences. In: Algorithmic aspects of cloud computing, pp 181–192. Springer, Berlin
Kosmatopoulos A, Tsichlas K, Gounaris A, Sioutas S, Pitoura E (2017) Hinode: an asymptotically space-optimal storage model for historical queries on graphs. Distrib Parallel Databases 35:249. https://doi.org/10.1007/s10619-017-7207-z
Labouseur AG, Birnbaum J, Olsen PW, Spillane SR, Vijayan J, Hwang J, Han W (2015) The g* graph database: efficiently managing large distributed dynamic graphs. Distrib and Parallel Databases 33(4):479–514
Leskovec J, Krevl A (2014) SNAP datasets: Stanford large network dataset collection. http://snap.stanford.edu/data
Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data, pp 135–146. ACM
Ren C, Lo E, Kao B, Zhu X, Cheng R (2011) On querying historical evolving graph sequences. PVLDB 4(11):726–737
Salzberg B, Tsotras VJ (1999) Comparison of access methods for time-evolving data. ACM Comput Surv (CSUR) 31(2):158–221
Semertzidis K, Pitoura E (2016) Durable graph pattern queries on historical graphs. In: 32nd IEEE international conference on data engineering, ICDE 2016, Helsinki, May 16–20, 2016, pp 541–552
Semertzidis K, Pitoura E, Lillis K (2015) Timereach: historical reachability queries on evolving graphs. In: Proceedings of the 18th international conference on extending database technology, EDBT 2015, Brussels, Belgium, March 23–27, pp 121–132
Shao B, Wang H, Li Y (2013) Trinity: a distributed graph engine on a memory cloud. In: Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD 2013, pp 505–516
Spillane SR, Birnbaum J, Bokser D, Kemp D, Labouseur AG, Olsen PW, Vijayan J, Hwang J, Yoon J (2013) A demonstration of the \(\text{G}_{\ast }\) graph database system. In: 29th IEEE international conference on data engineering, ICDE 2013, Brisbane, April 8–12, pp 1356–1359
Yang Y, Yu JX, Gao H, Pei J, Li J (2014) Mining most frequently changing component in evolving graphs. World Wide Web 17(3):351–376
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kosmatopoulos, A., Gounaris, A. & Tsichlas, K. Hinode: implementing a vertex-centric modelling approach to maintaining historical graph data. Computing 101, 1885–1908 (2019). https://doi.org/10.1007/s00607-019-00715-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-019-00715-6