Skip to main content

High-Performance Graph Data Management and Mining in Cloud Environments with X10

  • Chapter
  • First Online:
Cloud Computing

Part of the book series: Computer Communications and Networks ((CCN))

Abstract

Large-scale graph data management and mining in cloud environments have been a widely discussed issue in recent times. The goal and the scope of this chapter is to discuss how X10 (a Partitioned Global Address Space (PGAS) language) has been applied for programming data-intensive systems. Specifically, we focus on the problem of creating scalable systems for storing and processing large-scale graph data on HPC clouds with X10. The chapter first discusses about large-scale graph processing with X10. Next, it describes the experience of designing and implementing a distributed graph database engine called Acacia with X10. We specifically focus on Acacia’s RDF extension. Finally, it will describe how a graph database benchmarking framework called XGDBench has been developed to analyze the performance of graph database servers. Overall the chapter describes our experiences of implementing such graph-based systems and frameworks with X10.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abadi DJ, Marcus A, Madden SR, Hollenbach K (2009) Sw-store: a vertically partitioned DBMS for semantic web data management. VLDB J 18(2):385–406. doi: 10.1007/s00778-008-0125-y

    Article  Google Scholar 

  2. Agarwal S, Barik R, Sarkar V, Shyamasundar RK (2007) May-happen-in-parallel analysis of x10 programs, PPoPP ’07, San Jose, pp 183–193

    Google Scholar 

  3. Aggarwal CC, Wang H (2010) A survey of clustering algorithms for graph data. In: Aggarwal CC, Wang H, Elmagarmid AK (eds) Managing and mining graph data. The Kluwer international series on advances in database systems, vol 40. Springer, New York, pp 275–301

    Chapter  Google Scholar 

  4. An P, Jula A, Rus S, Saunders S, Smith T, Tanase G, Thomas N, Amato N, Rauchwerger L (2003) STAPL: an adaptive, generic parallel c++ library. In: Proceedings of the 14th international conference on Languages and compilers for parallel computing, LCPC’01. Springer, Berlin/Heidelberg, pp 193–208

    Chapter  Google Scholar 

  5. Anthonisse J (1971) The rush in a directed graph. Technical report BN 9/71

    Google Scholar 

  6. Arnold M, Grove D, Herta B, Hind M, Hirzel M, Iyengar A, Mandel L, Saraswat VA, Shinnar A, Siméon J, Takeuchi M, Tardieu O, Zhang W (2016) Meta: middleware for events, transactions, and analytics. IBM J Res Dev 60(2–3):15:1–15:10. doi: 10.1147/JRD.2016.2527419

  7. Aurelius (2013) Rexpro. https://github.com/tinkerpop/rexster/wiki/RexPro

  8. Aurelius (2015) Titan: distributed graph database. http://thinkaurelius.github.io/titan/

  9. Bader D, Cong G, Feo J (2005) On the architectural requirements for efficient execution of graph algorithms. In: International conference on parallel processing, ICPP 2005, Oslo, pp 547–556

    Google Scholar 

  10. Bader DA, Feo J, Gilbert J, Kepner J, Koester D, Loh E, Madduri K, Mann B, Meuse T, Robinson E (2009) HPC scalable graph analysis benchmark. http://www.graphanalysis.org/benchmark/

  11. Barrett B, Berry J, Murphy R, Wheeler K (2009) Implementing a portable multi-threaded graph library: the MTGL on Qthreads. In: IEEE international symposium on parallel distributed processing, IPDPS 2009, Rome, pp 1 –8

    Google Scholar 

  12. Batenkov D (2011) Boosting productivity with the boost graph library. XRDS 17:31–32

    Article  Google Scholar 

  13. Berry J, Hendrickson B, Kahan S, Konecny P (2007) Software and algorithms for graph queries on multithreaded architectures. In: IEEE international parallel and distributed processing symposium, IPDPS 2007, Long Beach, pp 1–14

    Google Scholar 

  14. Bizer C, Schultz A (2009) The Berlin SPARQL Benchmark. Int J Semant Web Inf Syst 5(2):1–24

    Article  Google Scholar 

  15. Blackford LS, Choi J, Cleary A, D’Azevedo E, Demmel J, Dhillon I, Dongarra J, Hammarling S, Henry G, Petitet A, Stanley K, Walker D, Whaley RC (1997) ScaLAPACK Users’ guide. Society for Industrial and Applied Mathematics, Philadelphia

    Book  MATH  Google Scholar 

  16. Brandes U (2001) A Faster algorithm for betweenness centrality. J Math Sociol 25:163–177

    Article  MATH  Google Scholar 

  17. Chakrabarti D, Zhan Y, Faloutsos C (2004) R-MAT: a recursive model for graph mining. In: Fourth SIAM international conference on data mining, Philadelphia

    Google Scholar 

  18. Charles P, Grothoff C, Saraswat V, Donawa C, Kielstra A, Ebcioglu K, von Praun C, Sarkar V (2005) X10: an object-oriented approach to non-uniform cluster computing. In: Proceedings of the 20th annual ACM SIGPLAN conference on object-oriented programming, systems, languages, and applications, OOPSLA ’05. ACM, New York, pp 519–538. doi: 10.1145/1094811.1094852

    Chapter  Google Scholar 

  19. Ciglan M, Averbuch A, Hluchy L (2012) Benchmarking traversal operations over graph databases. In: 2012 IEEE 28th international conference on data engineering workshops (ICDEW), Arlington, pp 186–189

    Google Scholar 

  20. Cong G, Almasi G, Saraswat V (2009) Fast PGAS connected components algorithms, PGAS ’09. ACM, New York, pp 13:1–13:6

    Google Scholar 

  21. Cong G, Almasi G, Saraswat V (2010) Fast PGAS implementation of distributed graph algorithms, SC ’10. IEEE Computer Society, Washington, DC, pp 1–11

    Google Scholar 

  22. Cooper BF, Silberstein A, Tam E, Ramakrishnan R, Sears R (2010) Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM symposium on cloud computing, SoCC ’10. ACM, New York, pp 143–154. doi: http://doi.acm.org/10.1145/1807128.1807152

  23. Csardi G, Nepusz T (2006) The igraph software package for complex network research. Inter J Complex Syst 1695. http://igraph.sf.net

  24. Cunningham D, Grove D, Herta B, Iyengar A, Kawachiya K, Murata H, Saraswat V, Takeuchi M, Tardieu O (2014) Resilient x10: efficient failure-aware programming. In: Proceedings of the 19th ACM SIGPLAN symposium on principles and practice of parallel programming, PPoPP ’14. ACM, New York, pp 67–80. doi: 10.1145/2555243.2555248

    Google Scholar 

  25. Dayarathna M, Suzumura T (2012) Xgdbench: a benchmarking platform for graph stores in exascale clouds. In: 2012 IEEE 4th international conference on cloud computing technology and science (CloudCom), pp 363–370. doi: 10.1109/CloudCom.2012.6427516

  26. Dayarathna M, Suzumura T (2014) Graph database benchmarking on cloud environments with XGDBench. Autom softw Eng 21(4):509–533. doi: 10.1007/s10515-013-0138-7

    Article  Google Scholar 

  27. Dayarathna M, Suzumura T (2014) Towards emulation of large scale complex network workloads on graph databases with XGDBench. In: 2014 IEEE international congress on big data, pp 748–755. doi: 10.1109/BigData.Congress.2014.140

  28. Dayarathna M, Suzumura T (2014) Towards scalable distributed graph database engine for hybrid clouds. In: 2015 5th international workshop on data-intensive computing in the clouds (DataCloud), pp 1–8. doi: 10.1109/DataCloud.2014.9

  29. Dayarathna M, Houngkaew C, Ogata H, Suzumura T (2012) Scalable performance of scalegraph for large scale graph analysis. In: 2012 19th international conference on high performance computing (HiPC), pp 1–9. doi: 10.1109/HiPC.2012.6507498

  30. Dayarathna M, Houngkaew C, Suzumura T (2012) Introducing scalegraph: an x10 library for billion scale graph analytics. In: Proceedings of the 2012 ACM SIGPLAN X10 workshop, X10 ’12. ACM, New York, pp 6:1–6:9. doi: 10.1145/2246056.2246062, http://doi.acm.org/10.1145/2246056.2246062

  31. Dayarathna M, Herath I, Dewmini Y, Mettananda G, Nandasiri S, Jayasena S, Suzumura T (2016) Introducing acacia-RDF: an x10-based scalable distributed RDF graph database engine. In: 2016 IEEE international parallel and distributed processing symposium workshops (IPDPSW), pp 1024–1032. doi: 10.1109/IPDPSW.2016.31

  32. Dongarra J et al (2011) The international exascale software project roadmap. Int J high Perform Comput Appl 25(1):3–60

    Article  Google Scholar 

  33. Ediger D, Jiang K, Riedy J, Bader DA, Corley C (2010) Massive social network analysis: mining twitter for social good. In: Proceedings of the 2010 39th international conference on parallel processing, ICPP ’10. IEEE Computer Society, Washington, DC, pp 583–593

    Google Scholar 

  34. Fortunato S (2009) Community detection in graphs. CoRR abs/0906.0612

    Google Scholar 

  35. Freeman LC (1977) A Set of Measures of centrality based on betweenness. Sociometry 40(1):35–41

    Article  Google Scholar 

  36. SPARQL G (2016) The SPARQL (pron: sparkle) query language antlr4 grammar. https://code.google.com/p/sparkle-g/

  37. Garcia R, Jarvi J, Lumsdaine A, Siek JG, Willcock J (2003) A comparative study of language support for generic programming, OOPSLA’03. ACM, New York, pp 115–134

    MATH  Google Scholar 

  38. Gregor D, Lumsdaine A (2005) Lifting sequential graph algorithms for distributed-memory parallel computation. SIGPLAN Not 40:423–437

    Article  Google Scholar 

  39. Grove D, Tardieu O, Cunningham D, Herta B, Peshansky I, Saraswat V (2011) A performance model for x10 applications: What’s going on under the hood?

    Google Scholar 

  40. Guo Y, Pan Z, Heflin J (2005) Lubm: a benchmark for owl knowledge base systems. Web Semant 3(2–3):158–182. doi: 10.1016/j.websem.2005.06.005

    Article  Google Scholar 

  41. Gurajada S, Seufert S, Miliaraki I, Theobald M (2014) Triad: a distributed shared-nothing RDF engine based on asynchronous message passing. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data, SIGMOD ’14. ACM, New York, pp 289–300. doi: 10.1145/2588555.2610511

    Google Scholar 

  42. Hammoud M, Rabbou DA, Nouri R, Beheshti SMR, Sakr S (2015) Dream: distributed RDF engine with adaptive query planner and minimal communication. Proc VLDB Endow 8(6):654–665. doi: 10.14778/2735703.2735705

  43. Hielscher F, Gottschling P (2012) Pargraph. http://pargraph.sourceforge.net/

  44. Ho LY, Wu JJ, Liu P (2012) Distributed graph database for large-scale social computing. In: 2012 IEEE 5th international conference on cloud computing (CLOUD), Piscataway, pp 455–462

    Google Scholar 

  45. Huppler K (2009) The art of building a good benchmark. In: Nambiar R, Poess M (ed) Performance evaluation and benchmarking. Springer, Berlin/Heidelberg, pp 18–30

    Google Scholar 

  46. IBM (2014) X10: performance and productivity at scale. http://x10-lang.org/

  47. Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392

    Article  MathSciNet  MATH  Google Scholar 

  48. Kemal Ebcioglu VS Vijay Saraswat (2004) X10: Programming for hierarchical parallelism and non-uniform data access. In: 3rd international workshop on language runtimes, impact of next generation processor architectures on virtual machine technologies

    Google Scholar 

  49. Labouseur AG, Birnbaum J, Olsen J PaulW, Spillane S, Vijayan J, Hwang JH, Han WS (2014) The g* graph database: efficiently managing large distributed dynamic graphs. Distrib Parallel Databases 1–36. doi: 10.1007/s10619-014-7140-3

  50. Law J (2003) Review of “the boost graph library: user guide and reference manual by jeremy g. siek, lie-quan lee, and andrew lumsdaine.” addison-wesley 2002. ACM SIGSOFT Softw Eng Notes 28(2):35–36

    Google Scholar 

  51. Lee LQ, Siek JG, Lumsdaine A (1999) The generic graph component library. SIGPLAN Not 34:399–414

    Article  Google Scholar 

  52. Leskovec J (2012) Snap: Stanford network analysis project. http://snap.stanford.edu/

  53. Lugowski A, Alber D, Buluç A, Gilbert J, Reinhardt S, Teng Y, Waranis A (2012, accepted) A flexible open-source toolbox for scalable complex graph analysis. In: SIAM Conference on Data Mining (SDM), Philadelphia

    Google Scholar 

  54. Ma L, Yang Y, Qiu Z, Xie G, Pan Y, Liu S (2006) Towards a complete owl ontology benchmark. In: Sure Y, Domingue J (eds) The semantic web: research and applications. Lecture notes in computer science, vol 4011. Springer, Berlin/Heidelberg, pp 125–139

    Chapter  Google Scholar 

  55. Madduri K, Hendrickson B, Berry J, Bader D, Crobak J (2008) Multithreaded algorithms for processing massive graphs

    Google Scholar 

  56. Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 international conference on management of data, SIGMOD ’10. ACM, New York, pp 135–146

    Chapter  Google Scholar 

  57. Marsland S (2009) Machine learning: an algorithmic perspective. Chapman & Hall/CRC, Boca Raton

    Google Scholar 

  58. Morsey M, Lehmann J, Auer S, Ngomo ACN (2011) Dbpedia sparql benchmark – performance assessment with real queries on real data. In: International semantic web conference (1)’11, pp 454–469

    Google Scholar 

  59. Myunghwan K, Leskovec J (2012) Multiplicative attribute graph model of real-world networks. Internet Math 8(1-2):113–160

    Article  MathSciNet  MATH  Google Scholar 

  60. Newmann M (2010) Networks: an introduction. Oxford University Press, Oxford/New York

    Book  Google Scholar 

  61. Neumann T, Weikum G (2010) The RDF-3x engine for scalable management of RDF data. The VLDB J 19(1):91–113. doi: 10.1007/s00778-009-0165-y

    Article  Google Scholar 

  62. Newmann M, Barabasi AL, Watts DJ (2006) The structure and dynamics of networks. Princeton University Press, Princeton

    Google Scholar 

  63. NMON (2016) NMON performance: a free tool to analyze aix and linux performance. http://www.ibm.com/developerworks/aix/library/au-analyze_aix/

  64. Ogata H, Dayarathna M, Suzumura T (2012) Towards highly scalable x10 based spectral clustering. In: 2012 19th international conference on high performance computing, pp 1–5. doi: 10.1109/HiPC.2012.6507522

  65. O’Madadhain J, Fisher D, White S, Boey Y (2003) The JUNG (Java Universal Network/Graph) Framework. Technical report, UCI-ICS

    Google Scholar 

  66. Papailiou N, Tsoumakos D, Karras P, Koziris N (2015) Graph-aware, workload-adaptive sparql query caching. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, SIGMOD 2015. ACM, New York, pp 1777–1792. doi: 10.1145/2723372.2723714

    Google Scholar 

  67. Pingali K, Nguyen D, Kulkarni M, Burtscher M, Hassaan MA, Kaleem R, Lee TH, Lenharth A, Manevich R, Méndez-Lojo M, Prountzos D, Sui X (2011) The tao of parallelism in algorithms. In: Proceedings of the 32nd ACM SIGPLAN conference on programming language design and implementation, PLDI ’11. ACM, New York, pp 12–25

    Chapter  Google Scholar 

  68. Project AX (2012) Xerces-c++ xml parser. http://xerces.apache.org/xerces-c/

  69. Rohloff K, Dean M, Emmons I, Ryder D, Sumner J (2007) An evaluation of triple-store technologies for large data stores. In: On the move to meaningful Internet systems 2007: OTM 2007 workshops. Lecture notes in computer science, vol 4806. Springer, Berlin/Heidelberg, pp 1105–1114

    Google Scholar 

  70. Sarwat M, Elnikety S, He Y, Kliot G (2012) Horton: online query execution engine for large distributed graphs. In: ICDE, pp 1289–1292

    Google Scholar 

  71. Schaeffer SE (2007) Graph clustering. Comput Sci Rev 1(1):27 – 64

    Article  MATH  Google Scholar 

  72. Schmidt M, Hornung T, Lausen G, Pinkel C (2008) Sp2bench: a SPARQL performance benchmark. CoRR abs/0806.4627

    Google Scholar 

  73. Skiena SS (2008) The algorithm design manual. 2nd edn. Springer, London

    Book  MATH  Google Scholar 

  74. Sourceforge (2012) Jung – java universal network/graph framework. http://jung.sourceforge.net/index.html

  75. Tardieu O, Herta B, Cunningham D, Grove D, Kambadur P, Saraswat V, Shinnar A, Takeuchi M, Vaziri M (2014) X10 and apgas at petascale. In: Proceedings of the 19th ACM SIGPLAN symposium on principles and practice of parallel programming, PPoPP ’14. ACM, New York, pp 53–66. doi: 10.1145/2555243.2555245

    Google Scholar 

  76. Thakker D, Osman T, Gohil S, Lakin P (2010) A pragmatic approach to semantic repositories benchmarking. In: Aroyo L, Antoniou G, Hyvönen E, ten Teije A, Stuckenschmidt H, Cabral L, Tudorache T (eds) The semantic web: research and applications. Lecture notes in computer science, vol 6088. Springer, Berlin/Heidelberg, pp 379–393

    Chapter  Google Scholar 

  77. Versaci F, Pingali K (2012) Processor allocation for optimistic parallelization of irregular programs. In: Proceedings of the 12th international conference on computational science and its applications – volume part I, ICCSA’12. Springer, Berlin/Heidelberg, pp 1–14

    Google Scholar 

  78. Vicknair C, Macias M, Zhao Z, Nan X, Chen Y, Wilkins D (2010) A comparison of a graph database and a relational database: a data provenance perspective. In: Proceedings of the 48th annual southeast regional conference, ACM SE ’10. ACM, New York, pp 42:1–42:6

    Google Scholar 

  79. W3C (2015) RDF – semantic web standards. http://www.w3.org/RDF/

  80. WANG J (2009) Sequential patterns. In: LIU L, öZSU M (eds) Encyclopedia of database systems. Springer, New York, pp 2621–2625

    Google Scholar 

  81. Wood D, Zaidman M, Ruth L, Hausenblas M (2014) Linked Data. Manning, Shelter Island

    Google Scholar 

  82. Wu B, Zhou Y, Yuan P, Jin H, Liu L (2014) Semstore: a semantic-preserving distributed RDF triple store. In: Proceedings of the 23rd ACM international conference on information and knowledge management, CIKM ’14. ACM, New York, pp 509–518. doi: 10.1145/2661829.2661876

    Google Scholar 

  83. Xia Y, Tanase I, Nai L, Tan W, Liu Y, Crawford J, Lin CY (2014) Graph analytics and storage. In: IEEE international conference on big data (Big Data), pp 942–951. doi: 10.1109/BigData.2014.7004326

  84. Yuan P, Liu P, Wu B, Jin H, Zhang W, Liu L (2013) Triplebit: a fast and compact system for large scale RDF data. Proc VLDB Endow 6(7):517–528. doi: 10.14778/2536349.2536352

  85. Zeng K, Yang J, Wang H, Shao B, Wang Z (2013) A distributed graph engine for web scale RDF data. In: Proceedings of the 39th international conference on Very Large Data Bases, VLDB Endowment, PVLDB’13, pp 265–276. http://dl.acm.org/citation.cfm?id=2488329.2488333

  86. Zhao Z, Liu J, Crespi N (2011) The design of activity-oriented social networking: Dig-event. In: Proceedings of the 13th international conference on information integration and web-based applications and services, IIWAS ’11. ACM, New York, pp 420–425

    Google Scholar 

  87. Zou L, Mo J, Chen L, Özsu MT, Zhao D (2011) gStore: answering SPARQL queries via subgraph matching. Proc VLDB Endow 4(8):482–493. doi: 10.14778/2002974.2002976

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Miyuru Dayarathna .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Dayarathna, M., Suzumura, T. (2017). High-Performance Graph Data Management and Mining in Cloud Environments with X10. In: Antonopoulos, N., Gillam, L. (eds) Cloud Computing. Computer Communications and Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-54645-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-54645-2_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-54644-5

  • Online ISBN: 978-3-319-54645-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics