Abstract
Graphs play an indispensable role in a wide range of application domains. Graph processing at scale, however, is facing challenges at all levels, ranging from system architectures to programming models. In this chapter, we review the challenges of parallel processing of large graphs, representative graph processing systems, general principles of designing large graph processing systems, and various graph computation paradigms. Graph processing covers a wide range of topics and graphs can be represented in different forms. Different graph representations lead to different computation paradigms and system architectures. From the perspective of graph representation, this chapter also briefly introduces a few alternative forms of graph representation besides adjacency list.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal CC, Wang H (eds) (2010) Managing and mining graph data. Advances in database systems, vol 40. Springer, Berlin
Aranda-Andújar A, Bugiotti F, Camacho-Rodríguez J, Colazzo D, Goasdoué F, Kaoudi Z, Manolescu I (2012) Amada: web data repositories in the amazon cloud. In: Proceedings of the 21st ACM international conference on information and knowledge management, CIKM ’12. ACM, New York, pp 2749–2751
Atre M, Chaoji V, Zaki MJ, Hendler JA (2010) Matrix “bit” loaded: a scalable lightweight join query processor for RDF data. In: WWW, pp 41–50
Bollobás B (1998) Modern graph theory. Graduate texts in mathematics, Springer, Berlin
Cheng J, Yu JX, Ding B, Yu PS, Wang H (2008) Fast graph pattern matching. In: ICDE, pp 913–922
Cohen J (2009) Graph twiddling in a mapreduce world. In: Computing in science & engineering, pp 29–41
Cordella LP, Foggia P, Sansone C, Vento M (2004) A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans Pattern Anal Mach Intell 26(10):1367–1372
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51:107–113
Garey MR, Johnson DS, Stockmeyer L (1974) Some simplified np-complete problems. In: Proceedings of the sixth annual ACM symposium on theory of computing, STOC ’74. ACM, New York, pp 47–63
Gonzalez JE, Low Y, Gu H, Bickson D, Guestrin C (2012) Powergraph: distributed graph-parallel computation on natural graphs. In: OSDI, pp 17–30
Gonzalez JE, Xin RS, Dave A, Crankshaw D, Franklin MJ, Stoica I (2014) Graphx: graph processing in a distributed dataflow framework. In: Proceedings of the 11th USENIX conference on operating systems design and implementation, OSDI’14. USENIX Association, Berkeley, pp 599–613
Gregor D, Lumsdaine A (2005) The parallel BGL: a generic library for distributed graph computations. In: Parallel object-oriented scientific computing (POOSC), POOSC ’05
He H, Singh AK (2008) Graphs-at-a-time: query language and access methods for graph databases. In: SIGMOD
Holder LB, Cook DJ, Djoko S (1994) Substucture discovery in the subdue system. In: KDD workshop, pp 169–180
Husain M, McGlothlin J, Masud MM, Khan L, Thuraisingham BM (2011) Heuristics-based query processing for large RDF graphs using cloud computing. IEEE Trans Knowl Data Eng 23(9):1312–1327
Kang U, Tsourakakis CE, Faloutsos C (2009) Pegasus: a peta-scale graph mining system implementation and observations. In: Proceedings of the 2009 ninth IEEE international conference on data mining, ICDM ’09. IEEE Computer Society, Washington, pp 229–238
Kaoudi Z, Manolescu I (2015) RDF in the clouds: a survey. VLDB J 24(1):67–91
Kyrola A, Blelloch G, Guestrin C (2012) Graphchi: large-scale graph computation on just a pc. In: OSDI, pp 31–46
Low Y, Bickson D, Gonzalez J, Guestrin C, Kyrola A, Hellerstein JM (2012) Distributed graphlab: a framework for machine learning and data mining in the cloud. Proc VLDB Endow 5(8):716–727
Lumsdaine A, Gregor D, Hendrickson B, Berry JW (2007) Challenges in parallel graph processing. Parallel Process Lett 17(1):5–20
Majumder S, Rixner S (2004) An event-driven architecture for MPI libraries. In: Proceedings of the 2004 Los Alamos computer science institute symposium
Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 international conference on management of data, SIGMOD ’10. ACM, New York, pp 135–146
Neumann T, Weikum G (2010) The rdf-3x engine for scalable management of RDF data. VLDB J 19(1):91–113
Oxley J (1992) Matroid theory. Oxford University Press, Oxford
Oxley J (2001) On the interplay between graphs and matroids. In: Surveys in combinatorics 2001. Cambridge University Press, Cambridge
Papailiou N, Konstantinou I, Tsoumakos D, Koziris N (2012) H2rdf: adaptive query processing on RDF data in the cloud. In: Proceedings of the 21st international conference on World Wide Web, WWW ’12 Companion. ACM, New York, pp 397–400
Qi Z, Xiao Y, Shao B, Wang H (2014) Distance oracle on billion node graphs. In: VLDB, VLDB Endowment
Qin L, Yu JX, Chang L, Cheng H, Zhang C, Lin X (2014) Scalable big graph processing in mapreduce. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data, SIGMOD ’14. ACM, New York, pp 827–838
Ravindra P, Kim H, Anyanwu K (2011) An intermediate algebra for optimizing RDF graph pattern matching on mapreduce. In: Proceedings of the 8th extended semantic web conference on the semanic web: research and applications - volume Part II, ESWC’11. Springer, Berlin, pp 46–61
Rohloff K, Schantz RE (2011) Clause-iteration with mapreduce to scalably query datagraphs in the shard graph-store. In: Proceedings of the fourth international workshop on data-intensive distributed computing, DIDC ’11. ACM, New York, pp 35–44
Sarwat M, Elnikety S, He Y, Mokbel MF (2013) Horton+: a distributed system for processing declarative reachability queries over partitioned graphs. Proc VLDB Endow 6(14):1918–1929
Shao B, Wang H, Li Y (2013) Trinity: a distributed graph engine on a memory cloud. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data, SIGMOD ’13. ACM, New York, pp 505–516
Sun Z, Wang H, Wang H, Shao B, Li J (2012) Efficient subgraph matching on billion node graphs. Proc VLDB Endow 5(9):788–799
Truemper K (1998) Matroid decomposition. Elsevier, Amsterdam
Ullmann JR (1976) An algorithm for subgraph isomorphism. J ACM 23(1):31–42
Valiant LG (1990) A bridging model for parallel computation. Commun ACM 33:103–111
von Eicken T, Culler DE, Goldstein SC, Schauser KE (1992) Active messages: a mechanism for integrated communication and computation. In: Proceedings of the 19th annual international symposium on computer architecture, ISCA ’92. ACM, New York, pp 256–266
Wang L, Xiao Y, Shao B, Wang H (2014) How to partition a billion-node graph. In: IEEE 30th international conference on data engineering, ICDE 2014, Chicago, March 31–April 4, 2014, pp 568–579
Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: HotCloud’10 proceedings of the 2nd USENIX conference on hot topics in cloud computing. USENIX Association, Berkeley, 18 pp.
Zeng K, Yang J, Wang H, Shao B, Wang Z (2013) A distributed graph engine for web scale RDF data. In: VLDB, VLDB Endowment
Zhang S, Li S, Yang J (2009) Gaddi: distance index based subgraph matching in biological networks. In: EDBT
Zhang X, Chen L, Tong Y, Wang M (2013) Eagre: towards scalable I/O efficient SPARQL query evaluation on the cloud. In: Proceedings of the 2013 IEEE international conference on data engineering (ICDE 2013), ICDE ’13. IEEE Computer Society, Washington, pp 565–576
Zhao P, Han J (2010) On graph query optimization in large networks. PVLDB 3(1):340–351
Zhao X, Sala A, Wilson C, Zheng H, Zhao BY (2010) Orion: shortest path estimation for large social graphs. In: WOSN’10
Zhao X, Sala A, Zheng H, Zhao BY (2011) Fast and scalable analysis of massive social graphs. CoRR
Zhu F, Qu Q, Lo D, Yan X, Han J, Yu PS (2011) Mining top-k large structural patterns in a massive network. In: VLDB
Zou L, Chen L, Özsu MT (2009) Distancejoin: pattern match query in a large graph database. PVLDB 2(1):886–897
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Shao, B., Li, Y. (2018). Parallel Processing of Graphs. In: Fletcher, G., Hidders, J., Larriba-Pey, J. (eds) Graph Data Management. Data-Centric Systems and Applications. Springer, Cham. https://doi.org/10.1007/978-3-319-96193-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-96193-4_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96192-7
Online ISBN: 978-3-319-96193-4
eBook Packages: Computer ScienceComputer Science (R0)