Abstract
Online graph database service providers have started migrating their operations to public clouds due to the increasing demand for low-cost, ubiquitous graph data storage and analysis. However, there is little support available for benchmarking graph database systems in cloud environments. We describe XGDBench which is a graph database benchmarking platform for cloud computing systems. XGDBench has been designed with the aim of creating an extensible platform for graph database benchmarking which makes it suitable for benchmarking future HPC systems. We extend the Yahoo! Cloud Serving Benchmark (YCSB) to the area of graph database benchmarking by creation of XGDBench. The benchmarking platform is written in X10 which is a PGAS language intended for programming future HPC systems. We describe the architecture of the XGDBench and explain how it differs from the current state-of-the-art. We conduct performance evaluation of five famous graph data stores AllegroGraph, Fuseki, Neo4j, OrientDB, and Titan using XGDBench on Tsubame 2.0 HPC cloud environment.












Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
AllegroGraph: AllegroGraph RDF Store web 3.0’s database. http://www.franz.com/agraph/allegrograph/ (2013)
Angles, R.: A comparison of current graph database models. In: IEEE 28th International Conference on Data Engineering Workshops (ICDEW), pp. 171–177 (2012)
Apache: Fuseki: serving RDF data over http. URL: http://jena.apache.org/documentation/serving_data/ (2012)
Aurelius: Rexster. URL: https://github.com/tinkerpop/rexster/wiki (2012a)
Aurelius: Titan: distributed graph database. URL: http://thinkaurelius.github.com/titan/ (2012b)
Aurelius: Rexpro. URL: https://github.com/tinkerpop/rexster/wiki/RexPro (2013)
Bader, D.A., Feo, J., Gilbert, J., Kepner, J., Koester, D., Loh, E., Madduri, K., Mann, B., Meuse, T., Robinson, E.: HPC scalable graph analysis benchmark (2009)
Bizer, C., Schultz, A.: The Berlin SPARQL benchmark. Int. J. Semant. Web Inf. Syst. 5(2), 1–24 (2009)
Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-Mat: a recursive model for graph mining. In: SDM (2004)
Chakrabarti, D., Faloutsos, C., McGlohon, M.: Graph mining: laws and generators. In: Aggarwal, C.C., Wang, H., Elmagarmid, A.K. (eds.) Managing and Mining Graph Data. The Kluwer International Series on Advances in Database Systems, vol. 40, pp. 69–123. Springer, New York (2010)
Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., von Praun, C., Sarkar, V.: X10: an object-oriented approach to non-uniform cluster computing. In: Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA ’05), pp. 519–538. ACM, New York (2005)
Ciglan, M., Averbuch, A., Hluchy, L.: Benchmarking traversal operations over graph databases. In: IEEE 28th International Conference on Data Engineering Workshops (ICDEW), pp. 186–189 (2012)
CloudGraph: CloudGraph.net graph database. URL: http://www.cloudgraph.com/ (2012)
Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC ’10), pp. 143–154. ACM, New York (2010). doi:10.1145/1807128.1807152
Cudré-Mauroux, P., Elnikety, S.: Graph data management systems for new application domains. Proc. VLDB Endow. 4(12), 1510–1511 (2011)
Dayarathna, M., Suzumura, T.X.: XGDBench: A benchmarking platform for Graph stores in exascale clouds. In: IEEE 4th International Conference on Cloud Computing Technology and Science (CloudCom), pp. 363–370 (2012)
Dominguez-Sal, D., Urbón-Bayes, P., Giménez-Vañó, A., Gómez-Villamor, S., Martínez-Bazán, N., Larriba-Pey, J.L.: Survey of graph database performance on the HPC scalable graph analysis benchmark. In: Proceedings of the 2010 International Conference on Web-Age Information Management (WAIM’10), pp. 37–48. Springer, Berlin (2010)
Dominguez-Sal, D., Martinez-Bazan, N., Muntes-Mulero, V., Baleta, P., Larriba-Pay, J.L.: A discussion on the design of graph database benchmarks. In: Proceedings of the Second TPC Technology Conference on Performance Evaluation, Measurement and Characterization of Complex Systems (TPCTC’10), pp. 25–40. Springer, Berlin (2011)
Dongarra, J., et al.: The international exascale software project roadmap. Int. J. High Perform. Comput. Appl. 25(1), 3–60 (2011)
Dudley, J., Pouliot, Y., Chen, R., Morgan, A., Butte, A.: Translational bioinformatics in the cloud: an affordable alternative. Genome Med. 2(8), 51 (2010)
Dydra: Dydra: networks made friendly. URL: http://dydra.com/ (2012)
Ekins, S., Gupta, R., Gifford, E., Bunin, B., Waller, C.: Chemical space: missing pieces in cheminformatics. Pharm. Res. 27, 2035–2039 (2010)
Endo, T., Nukada, A., Matsuoka, S., Maruyama, N.: Linpack evaluation on a supercomputer with heterogeneous accelerators. In: IPDPS, pp. 1–8 (2010)
Faloutsos, M., Faloutsos, P., Faloutsos, C.: On power-law relationships of the Internet topology. Comput. Commun. Rev. 29(4), 251–262 (1999)
FlockDB: FlockDB. URL: https://github.com/twitter/flockdb (2013)
Gremlin: Gremlin. URL: https://github.com/tinkerpop/gremlin/wiki/ (2013)
Guo, Y., Pan, Z., Heflin, J.: LUBM: a benchmark for owl knowledge base systems. J. Web Semant. 3(2–3), 158–182 (2005)
Holzschuher, F., Peinl, R.: Performance of graph query languages: comparison of cypher, gremlin and native access in Neo4j. In: Proceedings of the Joint EDBT/ICDT 2013 Workshops (EDBT ’13), pp. 195–204. ACM, New York (2013)
Huppler, K.: Performance Evaluation and Benchmarking. Chap. The Art of Building a Good Benchmark pp. 18–30. Springer, Berlin (2009)
IBM: X10: performance and productivity at scale. URL: http://x10-lang.org/ (2012)
Leskovec, J., Huttenlocher, D., Kleinberg, J.: Signed networks in social media. In: Proceedings of the 28th International Conference on Human Factors in Computing Systems (CHI ’10), pp. 1361–1370. ACM, New York (2010)
Ma, L., Yang, Y., Qiu, Z., Xie, G., Pan, Y., Liu, S.: Towards a complete owl ontology benchmark. In: Sure, Y., Domingue, J. (eds.) The Semantic Web: Research and Applications. Lecture Notes in Computer Science, vol. 4011, pp. 125–139. Springer, Berlin (2006)
Morsey, M., Lehmann, J., Auer, S., Ngomo, A.C.N.: DBpedia SPARQL benchmark—performance assessment with real queries on real data. In: International Semantic Web Conference (1)’11, pp. 454–469 (2011)
Murphy, R., Berry, J., McLendon, W., Hendrickson, B., Gregor, D., Lumsdaine, A.: DFS: a simple to write yet difficult to execute benchmark. In: IEEE International Symposium on Workload Characterization, pp. 175–177 (2006)
Myunghwan, K., Leskovec, J.: Multiplicative attribute Graph model of real-world networks. Internet Math. 8(1–2), 113–160 (2012)
Nambiar, R., Wakou, N., Carman, F., Majdalany, M.: Transaction processing performance council (tpc): state of the council 2010. In: Nambiar, R., Poess, M. (eds.) Performance Evaluation, Measurement and Characterization of Complex Systems. Lecture Notes in Computer Science, vol. 6417, pp. 1–9. Springer, Berlin (2011)
Neo4j: Neo4j Heroku add-on. URL: http://www.neo4j.org/develop/heroku (2012)
Newmann, M.: Networks: An Introduction. Oxford University Press, Oxford (2010)
NuvolaBase: NuvolaBase: cloudize your data—commercial support, training and services about OrientDB. URL: http://www.nuvolabase.com/site/ (2012)
Orient Technologies, O.: OrientDB graph-document NoSQl dbms. URL: http://www.orientdb.org/ (2013)
Partner, J., Vukotic, A., Watt, N.: Neo4j in Action. Manning Publications Co. (2012)
Robinson, I., Webber, J., Eifrem, E.: Graph Databases. O’Reilly, Sebastopol (2013)
Rohloff, K., Dean, M., Emmons, I., Ryder, D., Sumner, J.: An evaluation of triple-store technologies for large data stores. In: On the Move to Meaningful Internet Systems 2007: OTM 2007 Workshops. Lecture Notes in Computer Science, vol. 4806, pp. 1105–1114. Springer, Berlin (2007)
Sakr, S., Liu, A.: SLA-based and consumer-centric dynamic provisioning for cloud databases. In: IEEE 5th International Conference on Cloud Computing, pp. 360–367 (2012)
Sarwat, M., Elnikety, S., He, Y., Kliot, G.H.: Horton: Online query execution engine for large distributed graphs. In: ICDE, pp. 1289–1292 (2012)
Schmidt, M., Hornung, T., Lausen, G., Pinkel, C.: Sp2bench: a SPARQL performance benchmark. CoRR abs/0806.4627 (2008)
Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang, J.T., Ramage, D., Amin, N., Schwikowski, B., Ideker, T.: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13(11), 2498–2504 (2003)
Shao, B., Wang, H., Xiao, Y.: Managing and mining large graphs: systems and implementations. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD ’12), pp. 589–592. ACM, New York (2012)
Thakker, D., Osman, T., Gohil, S., Lakin, P.: A pragmatic approach to semantic repositories benchmarking. In: Aroyo, L., Antoniou, G., Hyvönen, E., ten Teije, A., Stuckenschmidt, H., Cabral, L., Tudorache, T. (eds.) The Semantic Web: Research and Applications. Lecture Notes in Computer Science, vol. 6088, pp. 379–393. Springer, Berlin (2010)
The Apache Software Foundation, T.A.S.: Cassandra. URL: http://cassandra.apache.org/ (2013a)
The Apache Software Foundation: Shindig—welcome to Apache Shindig. URL: http://shindig.apache.org/ (2013b)
Versaci, F., Pingali, K.: Processor allocation for optimistic parallelization of irregular programs. In: Proceedings of the 12th International Conference on Computational Science and Its Applications, Part I (ICCSA’12), pp. 1–14. Springer, Berlin (2012)
Vicknair, C., Macias, M., Zhao, Z., Nan, X., Chen, Y., Wilkins, D.: A comparison of a graph database and a relational database: a data provenance perspective. In: Proceedings of the 48th Annual Southeast Regional Conference (ACM SE ’10), pp. 42:1–42:6. ACM, New York (2010)
W3C: Rdf primer. URL: http://www.w3.org/TR/rdf-primer/ (2013)
Wang, J.: Sequential patterns. In: Liu, L., Özsu, M. (eds.) Encyclopedia of Database Systems, pp. 2621–2625. Springer, New York (2009)
Zhao, Z., Liu, J., Crespi, N.: The design of activity-oriented social networking: Dig-event. In: Proceedings of the 13th International Conference on Information Integration and Web-Based Applications and Services (iiWAS ’11), pp. 420–425. ACM, New York (2011)
Acknowledgements
This research was supported by the Japan Science and Technology Agency’s CREST project titled “Development of System Software Technologies for post-Peta Scale High Performance Computing”.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dayarathna, M., Suzumura, T. Graph database benchmarking on cloud environments with XGDBench. Autom Softw Eng 21, 509–533 (2014). https://doi.org/10.1007/s10515-013-0138-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10515-013-0138-7