Abstract
Large-scale graph data management and mining in cloud environments have been a widely discussed issue in recent times. The goal and the scope of this chapter is to discuss how X10 (a Partitioned Global Address Space (PGAS) language) has been applied for programming data-intensive systems. Specifically, we focus on the problem of creating scalable systems for storing and processing large-scale graph data on HPC clouds with X10. The chapter first discusses about large-scale graph processing with X10. Next, it describes the experience of designing and implementing a distributed graph database engine called Acacia with X10. We specifically focus on Acacia’s RDF extension. Finally, it will describe how a graph database benchmarking framework called XGDBench has been developed to analyze the performance of graph database servers. Overall the chapter describes our experiences of implementing such graph-based systems and frameworks with X10.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abadi DJ, Marcus A, Madden SR, Hollenbach K (2009) Sw-store: a vertically partitioned DBMS for semantic web data management. VLDB J 18(2):385–406. doi: 10.1007/s00778-008-0125-y
Agarwal S, Barik R, Sarkar V, Shyamasundar RK (2007) May-happen-in-parallel analysis of x10 programs, PPoPP ’07, San Jose, pp 183–193
Aggarwal CC, Wang H (2010) A survey of clustering algorithms for graph data. In: Aggarwal CC, Wang H, Elmagarmid AK (eds) Managing and mining graph data. The Kluwer international series on advances in database systems, vol 40. Springer, New York, pp 275–301
An P, Jula A, Rus S, Saunders S, Smith T, Tanase G, Thomas N, Amato N, Rauchwerger L (2003) STAPL: an adaptive, generic parallel c++ library. In: Proceedings of the 14th international conference on Languages and compilers for parallel computing, LCPC’01. Springer, Berlin/Heidelberg, pp 193–208
Anthonisse J (1971) The rush in a directed graph. Technical report BN 9/71
Arnold M, Grove D, Herta B, Hind M, Hirzel M, Iyengar A, Mandel L, Saraswat VA, Shinnar A, Siméon J, Takeuchi M, Tardieu O, Zhang W (2016) Meta: middleware for events, transactions, and analytics. IBM J Res Dev 60(2–3):15:1–15:10. doi: 10.1147/JRD.2016.2527419
Aurelius (2013) Rexpro. https://github.com/tinkerpop/rexster/wiki/RexPro
Aurelius (2015) Titan: distributed graph database. http://thinkaurelius.github.io/titan/
Bader D, Cong G, Feo J (2005) On the architectural requirements for efficient execution of graph algorithms. In: International conference on parallel processing, ICPP 2005, Oslo, pp 547–556
Bader DA, Feo J, Gilbert J, Kepner J, Koester D, Loh E, Madduri K, Mann B, Meuse T, Robinson E (2009) HPC scalable graph analysis benchmark. http://www.graphanalysis.org/benchmark/
Barrett B, Berry J, Murphy R, Wheeler K (2009) Implementing a portable multi-threaded graph library: the MTGL on Qthreads. In: IEEE international symposium on parallel distributed processing, IPDPS 2009, Rome, pp 1 –8
Batenkov D (2011) Boosting productivity with the boost graph library. XRDS 17:31–32
Berry J, Hendrickson B, Kahan S, Konecny P (2007) Software and algorithms for graph queries on multithreaded architectures. In: IEEE international parallel and distributed processing symposium, IPDPS 2007, Long Beach, pp 1–14
Bizer C, Schultz A (2009) The Berlin SPARQL Benchmark. Int J Semant Web Inf Syst 5(2):1–24
Blackford LS, Choi J, Cleary A, D’Azevedo E, Demmel J, Dhillon I, Dongarra J, Hammarling S, Henry G, Petitet A, Stanley K, Walker D, Whaley RC (1997) ScaLAPACK Users’ guide. Society for Industrial and Applied Mathematics, Philadelphia
Brandes U (2001) A Faster algorithm for betweenness centrality. J Math Sociol 25:163–177
Chakrabarti D, Zhan Y, Faloutsos C (2004) R-MAT: a recursive model for graph mining. In: Fourth SIAM international conference on data mining, Philadelphia
Charles P, Grothoff C, Saraswat V, Donawa C, Kielstra A, Ebcioglu K, von Praun C, Sarkar V (2005) X10: an object-oriented approach to non-uniform cluster computing. In: Proceedings of the 20th annual ACM SIGPLAN conference on object-oriented programming, systems, languages, and applications, OOPSLA ’05. ACM, New York, pp 519–538. doi: 10.1145/1094811.1094852
Ciglan M, Averbuch A, Hluchy L (2012) Benchmarking traversal operations over graph databases. In: 2012 IEEE 28th international conference on data engineering workshops (ICDEW), Arlington, pp 186–189
Cong G, Almasi G, Saraswat V (2009) Fast PGAS connected components algorithms, PGAS ’09. ACM, New York, pp 13:1–13:6
Cong G, Almasi G, Saraswat V (2010) Fast PGAS implementation of distributed graph algorithms, SC ’10. IEEE Computer Society, Washington, DC, pp 1–11
Cooper BF, Silberstein A, Tam E, Ramakrishnan R, Sears R (2010) Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM symposium on cloud computing, SoCC ’10. ACM, New York, pp 143–154. doi: http://doi.acm.org/10.1145/1807128.1807152
Csardi G, Nepusz T (2006) The igraph software package for complex network research. Inter J Complex Syst 1695. http://igraph.sf.net
Cunningham D, Grove D, Herta B, Iyengar A, Kawachiya K, Murata H, Saraswat V, Takeuchi M, Tardieu O (2014) Resilient x10: efficient failure-aware programming. In: Proceedings of the 19th ACM SIGPLAN symposium on principles and practice of parallel programming, PPoPP ’14. ACM, New York, pp 67–80. doi: 10.1145/2555243.2555248
Dayarathna M, Suzumura T (2012) Xgdbench: a benchmarking platform for graph stores in exascale clouds. In: 2012 IEEE 4th international conference on cloud computing technology and science (CloudCom), pp 363–370. doi: 10.1109/CloudCom.2012.6427516
Dayarathna M, Suzumura T (2014) Graph database benchmarking on cloud environments with XGDBench. Autom softw Eng 21(4):509–533. doi: 10.1007/s10515-013-0138-7
Dayarathna M, Suzumura T (2014) Towards emulation of large scale complex network workloads on graph databases with XGDBench. In: 2014 IEEE international congress on big data, pp 748–755. doi: 10.1109/BigData.Congress.2014.140
Dayarathna M, Suzumura T (2014) Towards scalable distributed graph database engine for hybrid clouds. In: 2015 5th international workshop on data-intensive computing in the clouds (DataCloud), pp 1–8. doi: 10.1109/DataCloud.2014.9
Dayarathna M, Houngkaew C, Ogata H, Suzumura T (2012) Scalable performance of scalegraph for large scale graph analysis. In: 2012 19th international conference on high performance computing (HiPC), pp 1–9. doi: 10.1109/HiPC.2012.6507498
Dayarathna M, Houngkaew C, Suzumura T (2012) Introducing scalegraph: an x10 library for billion scale graph analytics. In: Proceedings of the 2012 ACM SIGPLAN X10 workshop, X10 ’12. ACM, New York, pp 6:1–6:9. doi: 10.1145/2246056.2246062, http://doi.acm.org/10.1145/2246056.2246062
Dayarathna M, Herath I, Dewmini Y, Mettananda G, Nandasiri S, Jayasena S, Suzumura T (2016) Introducing acacia-RDF: an x10-based scalable distributed RDF graph database engine. In: 2016 IEEE international parallel and distributed processing symposium workshops (IPDPSW), pp 1024–1032. doi: 10.1109/IPDPSW.2016.31
Dongarra J et al (2011) The international exascale software project roadmap. Int J high Perform Comput Appl 25(1):3–60
Ediger D, Jiang K, Riedy J, Bader DA, Corley C (2010) Massive social network analysis: mining twitter for social good. In: Proceedings of the 2010 39th international conference on parallel processing, ICPP ’10. IEEE Computer Society, Washington, DC, pp 583–593
Fortunato S (2009) Community detection in graphs. CoRR abs/0906.0612
Freeman LC (1977) A Set of Measures of centrality based on betweenness. Sociometry 40(1):35–41
SPARQL G (2016) The SPARQL (pron: sparkle) query language antlr4 grammar. https://code.google.com/p/sparkle-g/
Garcia R, Jarvi J, Lumsdaine A, Siek JG, Willcock J (2003) A comparative study of language support for generic programming, OOPSLA’03. ACM, New York, pp 115–134
Gregor D, Lumsdaine A (2005) Lifting sequential graph algorithms for distributed-memory parallel computation. SIGPLAN Not 40:423–437
Grove D, Tardieu O, Cunningham D, Herta B, Peshansky I, Saraswat V (2011) A performance model for x10 applications: What’s going on under the hood?
Guo Y, Pan Z, Heflin J (2005) Lubm: a benchmark for owl knowledge base systems. Web Semant 3(2–3):158–182. doi: 10.1016/j.websem.2005.06.005
Gurajada S, Seufert S, Miliaraki I, Theobald M (2014) Triad: a distributed shared-nothing RDF engine based on asynchronous message passing. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data, SIGMOD ’14. ACM, New York, pp 289–300. doi: 10.1145/2588555.2610511
Hammoud M, Rabbou DA, Nouri R, Beheshti SMR, Sakr S (2015) Dream: distributed RDF engine with adaptive query planner and minimal communication. Proc VLDB Endow 8(6):654–665. doi: 10.14778/2735703.2735705
Hielscher F, Gottschling P (2012) Pargraph. http://pargraph.sourceforge.net/
Ho LY, Wu JJ, Liu P (2012) Distributed graph database for large-scale social computing. In: 2012 IEEE 5th international conference on cloud computing (CLOUD), Piscataway, pp 455–462
Huppler K (2009) The art of building a good benchmark. In: Nambiar R, Poess M (ed) Performance evaluation and benchmarking. Springer, Berlin/Heidelberg, pp 18–30
IBM (2014) X10: performance and productivity at scale. http://x10-lang.org/
Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392
Kemal Ebcioglu VS Vijay Saraswat (2004) X10: Programming for hierarchical parallelism and non-uniform data access. In: 3rd international workshop on language runtimes, impact of next generation processor architectures on virtual machine technologies
Labouseur AG, Birnbaum J, Olsen J PaulW, Spillane S, Vijayan J, Hwang JH, Han WS (2014) The g* graph database: efficiently managing large distributed dynamic graphs. Distrib Parallel Databases 1–36. doi: 10.1007/s10619-014-7140-3
Law J (2003) Review of “the boost graph library: user guide and reference manual by jeremy g. siek, lie-quan lee, and andrew lumsdaine.” addison-wesley 2002. ACM SIGSOFT Softw Eng Notes 28(2):35–36
Lee LQ, Siek JG, Lumsdaine A (1999) The generic graph component library. SIGPLAN Not 34:399–414
Leskovec J (2012) Snap: Stanford network analysis project. http://snap.stanford.edu/
Lugowski A, Alber D, Buluç A, Gilbert J, Reinhardt S, Teng Y, Waranis A (2012, accepted) A flexible open-source toolbox for scalable complex graph analysis. In: SIAM Conference on Data Mining (SDM), Philadelphia
Ma L, Yang Y, Qiu Z, Xie G, Pan Y, Liu S (2006) Towards a complete owl ontology benchmark. In: Sure Y, Domingue J (eds) The semantic web: research and applications. Lecture notes in computer science, vol 4011. Springer, Berlin/Heidelberg, pp 125–139
Madduri K, Hendrickson B, Berry J, Bader D, Crobak J (2008) Multithreaded algorithms for processing massive graphs
Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 international conference on management of data, SIGMOD ’10. ACM, New York, pp 135–146
Marsland S (2009) Machine learning: an algorithmic perspective. Chapman & Hall/CRC, Boca Raton
Morsey M, Lehmann J, Auer S, Ngomo ACN (2011) Dbpedia sparql benchmark – performance assessment with real queries on real data. In: International semantic web conference (1)’11, pp 454–469
Myunghwan K, Leskovec J (2012) Multiplicative attribute graph model of real-world networks. Internet Math 8(1-2):113–160
Newmann M (2010) Networks: an introduction. Oxford University Press, Oxford/New York
Neumann T, Weikum G (2010) The RDF-3x engine for scalable management of RDF data. The VLDB J 19(1):91–113. doi: 10.1007/s00778-009-0165-y
Newmann M, Barabasi AL, Watts DJ (2006) The structure and dynamics of networks. Princeton University Press, Princeton
NMON (2016) NMON performance: a free tool to analyze aix and linux performance. http://www.ibm.com/developerworks/aix/library/au-analyze_aix/
Ogata H, Dayarathna M, Suzumura T (2012) Towards highly scalable x10 based spectral clustering. In: 2012 19th international conference on high performance computing, pp 1–5. doi: 10.1109/HiPC.2012.6507522
O’Madadhain J, Fisher D, White S, Boey Y (2003) The JUNG (Java Universal Network/Graph) Framework. Technical report, UCI-ICS
Papailiou N, Tsoumakos D, Karras P, Koziris N (2015) Graph-aware, workload-adaptive sparql query caching. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, SIGMOD 2015. ACM, New York, pp 1777–1792. doi: 10.1145/2723372.2723714
Pingali K, Nguyen D, Kulkarni M, Burtscher M, Hassaan MA, Kaleem R, Lee TH, Lenharth A, Manevich R, Méndez-Lojo M, Prountzos D, Sui X (2011) The tao of parallelism in algorithms. In: Proceedings of the 32nd ACM SIGPLAN conference on programming language design and implementation, PLDI ’11. ACM, New York, pp 12–25
Project AX (2012) Xerces-c++ xml parser. http://xerces.apache.org/xerces-c/
Rohloff K, Dean M, Emmons I, Ryder D, Sumner J (2007) An evaluation of triple-store technologies for large data stores. In: On the move to meaningful Internet systems 2007: OTM 2007 workshops. Lecture notes in computer science, vol 4806. Springer, Berlin/Heidelberg, pp 1105–1114
Sarwat M, Elnikety S, He Y, Kliot G (2012) Horton: online query execution engine for large distributed graphs. In: ICDE, pp 1289–1292
Schaeffer SE (2007) Graph clustering. Comput Sci Rev 1(1):27 – 64
Schmidt M, Hornung T, Lausen G, Pinkel C (2008) Sp2bench: a SPARQL performance benchmark. CoRR abs/0806.4627
Skiena SS (2008) The algorithm design manual. 2nd edn. Springer, London
Sourceforge (2012) Jung – java universal network/graph framework. http://jung.sourceforge.net/index.html
Tardieu O, Herta B, Cunningham D, Grove D, Kambadur P, Saraswat V, Shinnar A, Takeuchi M, Vaziri M (2014) X10 and apgas at petascale. In: Proceedings of the 19th ACM SIGPLAN symposium on principles and practice of parallel programming, PPoPP ’14. ACM, New York, pp 53–66. doi: 10.1145/2555243.2555245
Thakker D, Osman T, Gohil S, Lakin P (2010) A pragmatic approach to semantic repositories benchmarking. In: Aroyo L, Antoniou G, Hyvönen E, ten Teije A, Stuckenschmidt H, Cabral L, Tudorache T (eds) The semantic web: research and applications. Lecture notes in computer science, vol 6088. Springer, Berlin/Heidelberg, pp 379–393
Versaci F, Pingali K (2012) Processor allocation for optimistic parallelization of irregular programs. In: Proceedings of the 12th international conference on computational science and its applications – volume part I, ICCSA’12. Springer, Berlin/Heidelberg, pp 1–14
Vicknair C, Macias M, Zhao Z, Nan X, Chen Y, Wilkins D (2010) A comparison of a graph database and a relational database: a data provenance perspective. In: Proceedings of the 48th annual southeast regional conference, ACM SE ’10. ACM, New York, pp 42:1–42:6
W3C (2015) RDF – semantic web standards. http://www.w3.org/RDF/
WANG J (2009) Sequential patterns. In: LIU L, öZSU M (eds) Encyclopedia of database systems. Springer, New York, pp 2621–2625
Wood D, Zaidman M, Ruth L, Hausenblas M (2014) Linked Data. Manning, Shelter Island
Wu B, Zhou Y, Yuan P, Jin H, Liu L (2014) Semstore: a semantic-preserving distributed RDF triple store. In: Proceedings of the 23rd ACM international conference on information and knowledge management, CIKM ’14. ACM, New York, pp 509–518. doi: 10.1145/2661829.2661876
Xia Y, Tanase I, Nai L, Tan W, Liu Y, Crawford J, Lin CY (2014) Graph analytics and storage. In: IEEE international conference on big data (Big Data), pp 942–951. doi: 10.1109/BigData.2014.7004326
Yuan P, Liu P, Wu B, Jin H, Zhang W, Liu L (2013) Triplebit: a fast and compact system for large scale RDF data. Proc VLDB Endow 6(7):517–528. doi: 10.14778/2536349.2536352
Zeng K, Yang J, Wang H, Shao B, Wang Z (2013) A distributed graph engine for web scale RDF data. In: Proceedings of the 39th international conference on Very Large Data Bases, VLDB Endowment, PVLDB’13, pp 265–276. http://dl.acm.org/citation.cfm?id=2488329.2488333
Zhao Z, Liu J, Crespi N (2011) The design of activity-oriented social networking: Dig-event. In: Proceedings of the 13th international conference on information integration and web-based applications and services, IIWAS ’11. ACM, New York, pp 420–425
Zou L, Mo J, Chen L, Özsu MT, Zhao D (2011) gStore: answering SPARQL queries via subgraph matching. Proc VLDB Endow 4(8):482–493. doi: 10.14778/2002974.2002976
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Dayarathna, M., Suzumura, T. (2017). High-Performance Graph Data Management and Mining in Cloud Environments with X10. In: Antonopoulos, N., Gillam, L. (eds) Cloud Computing. Computer Communications and Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-54645-2_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-54645-2_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54644-5
Online ISBN: 978-3-319-54645-2
eBook Packages: Computer ScienceComputer Science (R0)