Abstract
Poor locality as a natural property of graph data structures causes enormous amount of network traffic in large-scale distributed graph processing systems. Moreover, data transmission through the network is one of the most expensive operations in a distributed system. Therefore, reduction of network usage is highly required by a new graph computational model. In this paper, increasing the degree of machine independency has been considered a key factor of network traffic reduction. The proposed system benefits from a three-layered computational model to perfectly leverage the power of local information as much as possible. Moreover, this model simultaneously takes the advantages of both message-based and shared-state communication paradigms. Vertices can read and update values of others in the lowest layer directly, while they must send messages in other layers. By the use of memorization techniques, the proposed model introduces a new kind of intelligence that has encouraging effects on removing useless communications. Distinctive results of our experiments confirm significant improvements of the proposed model in relation to the previous systems like Pregel, GPS, and Blogel, as well as ExPregel. The results also show that the overhead of making processes independent along with intelligent is negligible in comparison with the cost of additional network communications.












Similar content being viewed by others
References
Kimura M et al (2010) Extracting influential nodes on a social network for information diffusion. Data Min Knowl Discov 20(1):70–97
Ma H et al (2008) Mining social networks using heat diffusion processes for marketing candidates selection. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management. ACM, Napa Valley, California, USA, pp 233–242
Saito K et al (2012) Efficient discovery of influential nodes for SIS models in social networks. Knowl Inf Syst 30(3):613–635
Becchetti L et al (2006) Link-based characterization and detection of web spam. In: AIRWeb
Castillo C et al (2007) Know your neighbors: Web spam detection using the web topology. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM
Arabnia HR, Oliver MA (1987) Arbitrary rotation of raster images with SIMD machine architectures. In: Computer Graphics Forum. Wiley Online Library
Golmohammadi K, Zaiane OR (2012) Data mining applications for fraud detection in securities market. In: Intelligence and Security Informatics Conference (EISIC), 2012 European
Arabnia HR (1990) A parallel algorithm for the arbitrary rotation of digitized images using process-and-data-decomposition approach. J Parallel Distrib Comput 10(2):188–192
Wani MA (2003) Arabnia HR Parallel edge-region-based segmentation algorithm targeted at reconfigurable multiring network. J Supercomput 25(1):43–62
Arabnia HR (1995) A distributed stereocorrelation algorithm. In: Computer Communications and Networks, 1995. Proceedings., Fourth International Conference on. : IEEE
Arabnia HR, Smith JW (1993) A reconfigurable interconnection network for imaging operations and its implementation using a multi-stage switching box. In: Proceedings of the 7th Annual International High Performance Computing Conference
Malewicz G et al (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. ACM, Indianapolis, Indiana, USA, pp 135–146
McCune RR, Weninger T, Madey G Thinking Like a vertex: a survey of vertex-centric frameworks for large-scale distributed graph processing
Batarfi O et al (2015) Large scale graph processing systems: survey and an experimental evaluation. Clust Comput 18(3):1189–1213
Han M et al (2014) An experimental comparison of pregel-like graph processing systems. Proce VLDB Endow 7(12):1047–1058
Sagharichian M, Naderi H, Haghjoo M (2015) ExPregel: a new computational model for large-scale graph processing. Concurr Comput Pract Exp 27(17):4954–4969
Arabnia HR, Bhandarkar SM (1996) Parallel stereocorrelation on a reconfigurable multi-ring network. J Supercomput 10(3):243–269
Bhandarkar SM, Arabnia HR (1995) The REFINE multiprocessor—theoretical properties and algorithms. Parallel Comput 21(11):1783–1805
Bhandarkar SM, Arabnia HR (1995) The Hough transform on a reconfigurable multi-ring network. J Parallel Distrib Comput 24(1):107–114
Gregor D, Lumsdaine A (2005) The parallel BGL: a generic library for distributed graph computations. Parallel Object-Oriented Scientific Computing (POOSC)
Leskovec J, Faloutsos C (2006) Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, Philadelphia, PA, USA, pp 631–636
Ribeiro B, Towsley D (2010) Estimating and sampling graphs with multidimensional random walks. In: Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement. ACM, Melbourne, Australia, pp 390–403
Kang U, Tsourakakis CE, Faloutsos C (2009) Pegasus: a peta-scale graph mining system implementation and observations. In: Data Mining, 2009. ICDM’09. Ninth IEEE International Conference on.: IEEE
Kang U et al (2008) Hadi: fast diameter estimation and mining in massive graphs with hadoop. Carnegie Mellon University. School of Computer Science, Machine Learning Department
Arabnia HR, Oliver MA (1989) A transputer network for fast operations on digitised images. In: Computer graphics forum. Wiley Online Library
Arabnia HR, Oliver MA (1987) A transputer network for the arbitrary rotation of digitised images. Comput J 30(5):425–432
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Jain N, Liao G, Willke TL (2013) GraphBuilder: scalable graph ETL framework. In: First International Workshop on Graph Data Management Experiences and Systems. ACM, New York, pp 1–6
Rastogi V et al (2012) Finding connected components on map-reduce in logarithmic rounds. arXiv preprint arXiv:1203.5387
Srirama SN, Jakovits P, Vainikko E (2012) Adapting scientific computing problems to clouds using MapReduce. Future Gener Comput Syst 28(1):184–192
Shao B, Wang H, Li Y (2012) The trinity graph engine. Technical Report 161291, Microsoft Research
Chen R et al (2010) Large graph processing in the cloud. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM, Indianapolis, Indiana, USA, pp 1123–1126
Apache Giraph. Available from: http://giraph.apache.org/
Apache Hama. Available from: http://hama.apache.org/
Salihoglu S, Widom J (2012) Gps: a graph processing system
Cai Z, Logothetis D, Siganos G (2012) Facilitating real-time graph mining. In: Proceedings of the Fourth International Workshop on Cloud Data Management. ACM, Maui, Hawaii, USA, pp 1–8
Kalnis P et al (2012) Mizan: optimizing graph mining in large parallel systems
Bao NT, Suzumura T (2013) Towards highly scalable pregel-based graph processing platform with \(\times \)10. In: Proceedings of the 22nd International Conference on World Wide Web Companion. International World Wide Web Conferences Steering Committee: Rio de Janeiro, Brazil, pp 501–508
Krepska E et al (2011) HipG: parallel processing of large-scale graphs. SIGOPS Oper Syst Rev 45(2):3–13
Low Y et al (2012) Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proc VLDB Endow 5(8):716–727
Low Y et al (2011) Graphlab: a distributed framework for machine learning in the cloud. arXiv preprint arXiv:1107.0922
Che S (2014) GasCL: a vertex-centric graph model for GPUs. In: IEEE High Performance Extreme Computing Conference (HPEC)
Che S, Beckmann BM, Reinhardt SK (2014) BelRed: constructing GPGPU graph applications with software building blocks In: High Performance Extreme Computing Conference (HPEC), 2014 IEEE, Waltham, MA, pp 1–6. doi:10.1109/HPEC.2014.7040961
Zhong J, He B (2013) Medusa: simplified graph processing on GPUs. IEEE Trans Parallel Distrib Syst 99:1–1
Roy A, Mihailovic I, Zwaenepoel W (2013) X-Stream: edge-centric graph processing using streaming partitions, In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. ACM, Farminton, Pennsylvania, pp 472–488
Yuan P et al (2014) Fast iterative graph computation: a path centric approach. In: High Performance Computing, Networking, Storage and Analysis, SC14: International Conference for IEEE
Xie W et al (2013) Fast iterative graph computation with block updates. Proc VLDB Endow 6(14):2014–2025
Yan D et al (2014) Blogel: a block-centric framework for distributed computation on real-world graphs. Proc VLDB Endow 7(14)
Simmhan Y et al (2014) GoFFish: a sub-graph centric framework for large-scale graph analytics. In: Silva F, Dutra I, Santos Costa V (eds) Euro-Par 2014 Parallel Processing. Springer International Publishing, pp 451–462
Chen R et al (2012) Improving large graph processing on partitioned graphs in the cloud. In: Proceedings of the Third ACM Symposium on Cloud Computing. ACM, San Jose, California, pp 1–13
Yang S et al (2012) Towards effective partition management for large graphs. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM, Scottsdale, Arizona, USA, pp 517–528
Tian Y et al (2013) From “Think Like a Vertex” to “Think Like a Graph”. Proc VLDB Endow 7(3)
Wang G et al (2013) Asynchronous large-scale graph processing made easy. In: CIDR
Han M et al (2014) An experimental comparison of pregel-like graph processing systems. Proc VLDB Endow 7(12):1047–1058
Stanford Large Network Dataset Collection (2014) Available from: http://snap.stanford.edu/data/
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sagharichian, M., Naderi, H. Intelligent and independent processes for overcoming big graphs. J Supercomput 73, 1438–1466 (2017). https://doi.org/10.1007/s11227-016-1834-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-016-1834-4