ABSTRACT
With the magnitude of graph-structured data continually increasing, graph processing systems that can scale-out and scale-up are needed to handle extreme-scale datasets. While existing distributed out-of-core solutions have made it possible, they suffer from limited performance due to excessive I/O and communication costs.
We present DFOGraph, a distributed fully-out-of-core graph processing system that applies and assembles multiple techniques to enable I/O- and communication-efficient processing. DFOGraph builds upon two-level partitions with adaptive compressed representations to allow fine-grained selective computation and communication. Our evaluation shows DFOGraph outperforms Chaos and HybridGraph significantly (>12.94× and >10.82×) when scaling out to eight nodes.
- Paolo Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna. 2011. Layered Label Propagation: A MultiResolution Coordinate-Free Ordering for Compressing Social Networks. In Proceedings of the 20th international conference on World Wide Web, Sadagopan Srinivasan, Krithi Ramamritham, Arun Kumar, M. P. Ravindra, Elisa Bertino, and Ravi Kumar (Eds.). ACM Press, 587--596.Google ScholarDigital Library
- Paolo Boldi and Sebastiano Vigna. 2004. The WebGraph Framework I: Compression Techniques. In Proc. of the Thirteenth International World Wide Web Conference (WWW 2004). ACM Press, Manhattan, USA, 595--601.Google ScholarDigital Library
- Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. RMAT: A recursive model for graph mining. In Proceedings of the 2004 SIAM International Conference on Data Mining. SIAM, 442--446.Google ScholarCross Ref
- Seongyun Ko and Wook-Shin Han. 2018. Turbograph++: A scalable and fast graph analytics system. In Proceedings of the 2018 International Conference on Management of Data. ACM, 395--410.Google ScholarDigital Library
- Jurij Leskovec, Deepayan Chakrabarti, Jon Kleinberg, and Christos Faloutsos. 2005. Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication. In European conference on principles of data mining and knowledge discovery. Springer, 133--145.Google ScholarCross Ref
- Heng Lin, Xiaowei Zhu, Bowen Yu, Xiongchao Tang, Wei Xue, Wenguang Chen, Lufei Zhang, Torsten Hoefler, Xiaosong Ma, Xin Liu, et al. 2018. ShenTu: processing multi-trillion edge graphs on millions of cores in seconds. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. IEEE Press, 56.Google ScholarDigital Library
- Robert Ryan McCune, Tim Weninger, and Greg Madey. 2015. Thinking like a vertex: a survey of vertex-centric frameworks for large-scale distributed graph processing. ACM Computing Surveys (CSUR) 48, 2 (2015), 25.Google ScholarDigital Library
- Amitabha Roy, Laurent Bindschaedler, Jasmina Malicevic, and Willy Zwaenepoel. 2015. Chaos: Scale-out graph processing from secondary storage. In Proceedings of the 25th Symposium on Operating Systems Principles. ACM, 410--424.Google ScholarDigital Library
- Zhigang Wang, Yu Gu, Yubin Bao, Ge Yu, and Jeffrey Xu Yu. 2016. Hybrid pulling/pushing for I/O-efficient distributed and iterative graph computing. In Proceedings of the 2016 International Conference on Management of Data. ACM, 479--494.Google ScholarDigital Library
Index Terms
- DFOGraph: an I/O- and communication-efficient system for distributed fully-out-of-core graph processing
Recommendations
The Concurrency Control Mechanism of SDD-1: A System for Distributed Databases (The Fully Redundant Case)
SDD-1, A System for Distributed Databases, is a distributed database system being developed by Computer Corporation of America (CCA), Cambridge, MA. SDD-1 permits data to be stored redundantly at several database sites in order to enhance the ...
A Distributed Algorithm for Knot Detection in a Distributed Graph
ICPP '02: Proceedings of the 2002 International Conference on Parallel ProcessingKnot detection in a distributed graph is an important problem and finds applications in several areas such as packet switching, distributed simulation, and distributed database systems. This paper presents a distributed algorithm to efficiently detect ...
Object-Oriented Design for a Distributed Priority Queue
COMPSAC '95: Proceedings of the 19th International Computer Software and Applications ConferenceAn abstract model for distributed objects is presented which is useful for designing and implementing software for distributed systems. In this model, constituent objects share a distributed state, and are identical representatives of the distributed ...
Comments