Abstract
Increasingly there is a need to process graphs that are larger than the available memory on today’s machines. Many systems have been developed with graph representations that are efficient and compact for out-of-core processing. A necessary task in these systems is memory management. This paper presents a system called Cacheap which automatically and efficiently manages the available memory to maximize the speed of graph processing, minimize the amount of disk access, and maximize the utilization of memory for graph data. It has a simple interface that can be easily adopted by existing graph engines. The paper describes the new system, uses it in recent graph engines, and demonstrates its integer factor improvements in the speed of large-scale graph processing.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Coffman T, Greenblatt S, Marcus S. Graph-based technologies for intelligence analysis. Communications of the ACM, 2004, 47(3): 45-47.
Han W, Miao Y, Li K, Wu M, Yang F, Zhou L, Prabhakaran V, Chen W, Chen E. Chronos: A graph engine for temporal graph analysis. In Proc. the 9th Eurosys Conference, April 2014, Article No. 1.
Jeong H, Mason P S, Barabasi A L, Oltvai N Z. Lethality and centrality in protein networks. Nature, 2001, 411(6833): 41-42.
Xiang L, Yuan Q, Zhao S, Chen L, Zhang X, Yang Q, Sun J. Temporal recommendation on graphs via long- and shortterm preference fusion. In Proc. the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 2010, pp.723-732.
Cheng J, Liu Q, Li Z, Fan W, Lui C S J, He C. VENUS: Vertex-centric streamlined graph computation on a single PC. In Proc. the 31st IEEE International Conference on Data Engineering, April 2015, pp.1131-1142.
Chi Y, Dai G, Wang Y, Sun G, Li G, Yang H. NXgraph: An efficient graph processing system on a single machine. In Proc. the 32nd IEEE International Conference on Data Engineering, May 2016, pp.409-420.
Han W, Lee S, Park K, Lee J, Kim M, Kim J, Yu H. Turbo-Graph: A fast parallel graph engine handling billion-scale raphs in a single PC. In Proc. the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2013, pp.77-85.
Maass S, Min C, Kashyap S, Kang W, Kumar M, Kim T. Mosaic: Processing a trillion-edge graph on a single machine. In Proc. the 12th European Conference on Computer Systems, April 2017, pp.527-543.
Roy A, Mihailovic I, Zwaenepoel W. X-Stream: Edgecentric graph processing using streaming partitions. In Proc. the 24th ACM SIGOPS Symposium of Operating Systems Principles, November 2013, pp.472-488.
Vora K, Xu G Q, Gupta R. Load the edges you need: A generic I/O optimization for disk-based graph processing. In Proc. the 2016 USENIX Annual Technical Conference, June 2016, pp.507-522.
Zhang Y, Liao X, Jin H, Gu L, Tan G, Zhou B. Hot-Graph: Efficient asynchronous processing for real-world graphs. IEEE Transactions on Computers, 2017, 66(5): 799-809.
Zhu X, Han W, Chen W. GridGraph: Large-scale graph processing on a single machine using 2-level hierarchical partitioning. In Proc. the 2015 USENIX Annual Technical Conference, July 2015, pp.375-386.
Kyrola A, Blelloch E G, Guestrin C. GraphChi: Large-scale graph computation on just a PC. In Proc. the 10th USENIX Symposium on Operating Systems Design and Implementation, October 2012, pp.31-46.
Zheng D, Mhembere D, Burns C R, Vogelstein T J, Priebe E C, Szalay S A. FlashGraph: Processing billion-node graphs on an array of commodity SSDs. In Proc. the 13th USENIX Conference on File and Storage Technologies, February 2015, pp.45-58.
Nguyen D, Lenharth A, Pingali K. A lightweight infrastructure for graph analytics. In Proc. the 24th ACM SIGOPS Symposium on Operating Systems Principles, November 2013, pp.456-471.
Chhugani J, Satish N, Kim C, Sewall J, Dubey P. Fast and efficient graph traversal algorithm for CPUs: Maximizing single-node efficiency. In Proc. the 26th IEEE International Parallel and Distributed Processing Symposium, May 2012, pp.378-389.
Gonzalez E J, Low Y, Gu H, Bickson D, Guestrin C. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proc. the 10th USENIX Symposium on Operating Systems Design and Implementation, October 2012, pp.17-30.
Gonzalez E J, Xin S R, Dace A, Crankshaw D, Franklin JM, Stoica I. GraphX: Graph processing in distributed dataflow framework. In Proc. the 11th USENIX Symposium on Operating Systems Design and Implementation, October 2014, pp.599-613.
Liu H, Huang H H. Graphene: Fine-grained IO management for graph computing. In Proc. the 15th USENIX Conference on File and Storage Technologies, February 2017, pp.285-300.
Malexicz G, Austern H M, Bik J C A, Dehnert C J, Horn I, Leiser N, Czajkowski G. Pregel: A system for largescale graph processing. In Proc. the ACM SIGMOD International Conference on Management of Data, June 2010, pp.135-146.
Roy A, Bindschaedler L, Malicevic J, Zwaenepoel W. Chaos: Scale-out graph processing from secondary storage. In Proc. the 25th Symposium on Operating Systems Principles, October 2015, pp.410-424.
Kwak H, Lee C, Park H, Moon B S. What is Twitter, a social network or a news media? In Proc. the 19th International Conference on World Wide Web, April 2010, pp.591-600.
Boldi P, Vigna S. TheWebGraph framework I: Compression techniques. In Proc. the 13th International World Wide Web Conference, May 2004, pp.595-602.
Belady A L. A study of replacement algorithms for a virtualstorage computer. IBM Systems Journal, 1966, 5(2): 78-101.
Mattson L R, Gecsei J, Slutz D, Traiger L I. Evaluation techniques for storage hierarchies. IBM Systems Journal, 1970, 9(2): 78-117.
Faloutsos M, Faloutsos P, Faloutsos C. On power-law relationships of the Internet topology. In Proc. the 1999 ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, August 1999, pp.251-262.
Wang K, Xu H G, Su Z, Liu D Y. Graph Q: Graph query processing with abstraction refinement: Scalable and programmable analytics over very large graphs on a single PC. In Proc. the 2015 USENIX Annual Technical Conference, July 2015, pp.387-401.
Wu M, Yang F, Xue J, Xiao W, Miao Y, Wei L, Lin H, Dai Y, Zhou L. GraM: Scaling graph computation to the trillions. In Proc. the 6th ACM Symposium on Cloud Computing, August 2016, pp.408-421.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
ESM 1
(PDF 483 kb)
Rights and permissions
About this article
Cite this article
Zhao, P., Ding, C., Liu, L. et al. Cacheap: Portable and Collaborative I/O Optimization for Graph Processing. J. Comput. Sci. Technol. 34, 690–706 (2019). https://doi.org/10.1007/s11390-019-1936-6
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-019-1936-6