Skip to main content
Log in

Hinode: implementing a vertex-centric modelling approach to maintaining historical graph data

  • Published:
Computing Aims and scope Submit manuscript

Abstract

Over the past few years, there has been a rapid increase of data originating from evolving networks such as social networks, sensor networks and others. A major challenge that arises when handling such networks and their respective graphs is the ability to issue a historical query on their data, that is, a query that is concerned with the state of the graph at previous time instances. While there has been a number of works that index the historical data in a time-centric manner (i.e. according to the time instance an update event occurs), in this work, we focus on the less-explored vertex-centric storage approach (i.e. according to the entity in which an update event occurs). We demonstrate that the design choices for a vertex-centric model are not trivial, by proposing two different modelling and storage models that leverage NoSQL technology and investigating their tradeoffs. More specifically, we experimentally evaluate the two models and show that under certain cases, their relative performance can differ by several times. Finally, we provide evidence that simple baseline and non-NoSQL solutions are slower by up to an order of magnitude.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Source code available at https://github.com/hinodeauthors/hinode.

  2. For reasons of clarity edge labels, vertex colors and edge weights are not shown.

  3. The queries executed were Average Vertex Degree, Degree Distribution and One-Hop Neighborhood Retrieval on the last snapshot of the sequence—see Sect. 5.1.

  4. citHep-Th SNAP Dataset [11]—see Sect. 5.1.

  5. Source code available at https://github.com/akosmato/HinodeNoSQL.

  6. Due to difficulties in assigning some specific edges to a particular snapshot we removed \(0.4\%\) of the total edges in the “hep-th” and “hep-ph” datasets and \(0.04\%\) edges of the “USPatents” dataset.

  7. As an example, a sequence of 100 snapshots that gets indexed every 20 snapshots would be comprised of five smaller ST indices whereas a vertex that exists in the first 75 snapshots would only be present in the first 4 smaller ST indices.

  8. e.g. A performance ratio of 2 corresponds to MT requiring half the execution time of ST.

  9. https://neo4j.com/.

  10. Measured through the “nodetool” utility.

References

  1. Akiba T, Iwata Y, Yoshida Y (2014) Dynamic and historical shortest-path distance queries on large evolving networks by pruned landmark labeling. In: 23rd international world wide web conference, WWW’14, pp 237–248

  2. Apache Giraph. http://giraph.apache.org/. Accessed 12 July 2018

  3. Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512

    Article  MathSciNet  Google Scholar 

  4. Gonzalez JE, Xin RS, Dave A, Crankshaw D, Franklin MJ, Stoica I (2014) Graphx: graph processing in a distributed dataflow framework. OSDI 14:599–613

    Google Scholar 

  5. Huo W, Tsotras VJ (2014) Efficient temporal shortest path queries on evolving social graphs. In: Conference on scientific and statistical database management, SSDBM ’14, pp 38:1–38:4

  6. Khurana U, Deshpande A (2013) Efficient snapshot retrieval over historical graph data. In: 29th IEEE international conference on data engineering, ICDE 2013, Brisbane, April 8–12, pp 997–1008

  7. Khurana U, Deshpande A (2016) Storing and analyzing historical graph data at scale. In: Proceedings of the 19th international conference on extending database technology, EDBT 2016, pp 65–76

  8. Kosmatopoulos A, Giannakopoulou K, Papadopoulos AN, Tsichlas K (2016) An overview of methods for handling evolving graph sequences. In: Algorithmic aspects of cloud computing, pp 181–192. Springer, Berlin

  9. Kosmatopoulos A, Tsichlas K, Gounaris A, Sioutas S, Pitoura E (2017) Hinode: an asymptotically space-optimal storage model for historical queries on graphs. Distrib Parallel Databases 35:249. https://doi.org/10.1007/s10619-017-7207-z

    Article  Google Scholar 

  10. Labouseur AG, Birnbaum J, Olsen PW, Spillane SR, Vijayan J, Hwang J, Han W (2015) The g* graph database: efficiently managing large distributed dynamic graphs. Distrib and Parallel Databases 33(4):479–514

    Article  Google Scholar 

  11. Leskovec J, Krevl A (2014) SNAP datasets: Stanford large network dataset collection. http://snap.stanford.edu/data

  12. Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data, pp 135–146. ACM

  13. Ren C, Lo E, Kao B, Zhu X, Cheng R (2011) On querying historical evolving graph sequences. PVLDB 4(11):726–737

    Google Scholar 

  14. Salzberg B, Tsotras VJ (1999) Comparison of access methods for time-evolving data. ACM Comput Surv (CSUR) 31(2):158–221

    Article  Google Scholar 

  15. Semertzidis K, Pitoura E (2016) Durable graph pattern queries on historical graphs. In: 32nd IEEE international conference on data engineering, ICDE 2016, Helsinki, May 16–20, 2016, pp 541–552

  16. Semertzidis K, Pitoura E, Lillis K (2015) Timereach: historical reachability queries on evolving graphs. In: Proceedings of the 18th international conference on extending database technology, EDBT 2015, Brussels, Belgium, March 23–27, pp 121–132

  17. Shao B, Wang H, Li Y (2013) Trinity: a distributed graph engine on a memory cloud. In: Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD 2013, pp 505–516

  18. Spillane SR, Birnbaum J, Bokser D, Kemp D, Labouseur AG, Olsen PW, Vijayan J, Hwang J, Yoon J (2013) A demonstration of the \(\text{G}_{\ast }\) graph database system. In: 29th IEEE international conference on data engineering, ICDE 2013, Brisbane, April 8–12, pp 1356–1359

  19. Yang Y, Yu JX, Gao H, Pei J, Li J (2014) Mining most frequently changing component in evolving graphs. World Wide Web 17(3):351–376

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreas Kosmatopoulos.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kosmatopoulos, A., Gounaris, A. & Tsichlas, K. Hinode: implementing a vertex-centric modelling approach to maintaining historical graph data. Computing 101, 1885–1908 (2019). https://doi.org/10.1007/s00607-019-00715-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-019-00715-6

Keywords

Navigation