Skip to main content
Log in

Cacheap: Portable and Collaborative I/O Optimization for Graph Processing

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Increasingly there is a need to process graphs that are larger than the available memory on today’s machines. Many systems have been developed with graph representations that are efficient and compact for out-of-core processing. A necessary task in these systems is memory management. This paper presents a system called Cacheap which automatically and efficiently manages the available memory to maximize the speed of graph processing, minimize the amount of disk access, and maximize the utilization of memory for graph data. It has a simple interface that can be easily adopted by existing graph engines. The paper describes the new system, uses it in recent graph engines, and demonstrates its integer factor improvements in the speed of large-scale graph processing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Coffman T, Greenblatt S, Marcus S. Graph-based technologies for intelligence analysis. Communications of the ACM, 2004, 47(3): 45-47.

    Article  Google Scholar 

  2. Han W, Miao Y, Li K, Wu M, Yang F, Zhou L, Prabhakaran V, Chen W, Chen E. Chronos: A graph engine for temporal graph analysis. In Proc. the 9th Eurosys Conference, April 2014, Article No. 1.

  3. Jeong H, Mason P S, Barabasi A L, Oltvai N Z. Lethality and centrality in protein networks. Nature, 2001, 411(6833): 41-42.

    Article  Google Scholar 

  4. Xiang L, Yuan Q, Zhao S, Chen L, Zhang X, Yang Q, Sun J. Temporal recommendation on graphs via long- and shortterm preference fusion. In Proc. the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 2010, pp.723-732.

  5. Cheng J, Liu Q, Li Z, Fan W, Lui C S J, He C. VENUS: Vertex-centric streamlined graph computation on a single PC. In Proc. the 31st IEEE International Conference on Data Engineering, April 2015, pp.1131-1142.

  6. Chi Y, Dai G, Wang Y, Sun G, Li G, Yang H. NXgraph: An efficient graph processing system on a single machine. In Proc. the 32nd IEEE International Conference on Data Engineering, May 2016, pp.409-420.

  7. Han W, Lee S, Park K, Lee J, Kim M, Kim J, Yu H. Turbo-Graph: A fast parallel graph engine handling billion-scale raphs in a single PC. In Proc. the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2013, pp.77-85.

  8. Maass S, Min C, Kashyap S, Kang W, Kumar M, Kim T. Mosaic: Processing a trillion-edge graph on a single machine. In Proc. the 12th European Conference on Computer Systems, April 2017, pp.527-543.

  9. Roy A, Mihailovic I, Zwaenepoel W. X-Stream: Edgecentric graph processing using streaming partitions. In Proc. the 24th ACM SIGOPS Symposium of Operating Systems Principles, November 2013, pp.472-488.

  10. Vora K, Xu G Q, Gupta R. Load the edges you need: A generic I/O optimization for disk-based graph processing. In Proc. the 2016 USENIX Annual Technical Conference, June 2016, pp.507-522.

  11. Zhang Y, Liao X, Jin H, Gu L, Tan G, Zhou B. Hot-Graph: Efficient asynchronous processing for real-world graphs. IEEE Transactions on Computers, 2017, 66(5): 799-809.

    Article  MathSciNet  MATH  Google Scholar 

  12. Zhu X, Han W, Chen W. GridGraph: Large-scale graph processing on a single machine using 2-level hierarchical partitioning. In Proc. the 2015 USENIX Annual Technical Conference, July 2015, pp.375-386.

  13. Kyrola A, Blelloch E G, Guestrin C. GraphChi: Large-scale graph computation on just a PC. In Proc. the 10th USENIX Symposium on Operating Systems Design and Implementation, October 2012, pp.31-46.

  14. Zheng D, Mhembere D, Burns C R, Vogelstein T J, Priebe E C, Szalay S A. FlashGraph: Processing billion-node graphs on an array of commodity SSDs. In Proc. the 13th USENIX Conference on File and Storage Technologies, February 2015, pp.45-58.

  15. Nguyen D, Lenharth A, Pingali K. A lightweight infrastructure for graph analytics. In Proc. the 24th ACM SIGOPS Symposium on Operating Systems Principles, November 2013, pp.456-471.

  16. Chhugani J, Satish N, Kim C, Sewall J, Dubey P. Fast and efficient graph traversal algorithm for CPUs: Maximizing single-node efficiency. In Proc. the 26th IEEE International Parallel and Distributed Processing Symposium, May 2012, pp.378-389.

  17. Gonzalez E J, Low Y, Gu H, Bickson D, Guestrin C. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proc. the 10th USENIX Symposium on Operating Systems Design and Implementation, October 2012, pp.17-30.

  18. Gonzalez E J, Xin S R, Dace A, Crankshaw D, Franklin JM, Stoica I. GraphX: Graph processing in distributed dataflow framework. In Proc. the 11th USENIX Symposium on Operating Systems Design and Implementation, October 2014, pp.599-613.

  19. Liu H, Huang H H. Graphene: Fine-grained IO management for graph computing. In Proc. the 15th USENIX Conference on File and Storage Technologies, February 2017, pp.285-300.

  20. Malexicz G, Austern H M, Bik J C A, Dehnert C J, Horn I, Leiser N, Czajkowski G. Pregel: A system for largescale graph processing. In Proc. the ACM SIGMOD International Conference on Management of Data, June 2010, pp.135-146.

  21. Roy A, Bindschaedler L, Malicevic J, Zwaenepoel W. Chaos: Scale-out graph processing from secondary storage. In Proc. the 25th Symposium on Operating Systems Principles, October 2015, pp.410-424.

  22. Kwak H, Lee C, Park H, Moon B S. What is Twitter, a social network or a news media? In Proc. the 19th International Conference on World Wide Web, April 2010, pp.591-600.

  23. Boldi P, Vigna S. TheWebGraph framework I: Compression techniques. In Proc. the 13th International World Wide Web Conference, May 2004, pp.595-602.

  24. Belady A L. A study of replacement algorithms for a virtualstorage computer. IBM Systems Journal, 1966, 5(2): 78-101.

    Article  Google Scholar 

  25. Mattson L R, Gecsei J, Slutz D, Traiger L I. Evaluation techniques for storage hierarchies. IBM Systems Journal, 1970, 9(2): 78-117.

    Article  MATH  Google Scholar 

  26. Faloutsos M, Faloutsos P, Faloutsos C. On power-law relationships of the Internet topology. In Proc. the 1999 ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, August 1999, pp.251-262.

  27. Wang K, Xu H G, Su Z, Liu D Y. Graph Q: Graph query processing with abstraction refinement: Scalable and programmable analytics over very large graphs on a single PC. In Proc. the 2015 USENIX Annual Technical Conference, July 2015, pp.387-401.

  28. Wu M, Yang F, Xue J, Xiao W, Miao Y, Wei L, Lin H, Dai Y, Zhou L. GraM: Scaling graph computation to the trillions. In Proc. the 6th ACM Symposium on Cloud Computing, August 2016, pp.408-421.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Liu.

Electronic supplementary material

ESM 1

(PDF 483 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, P., Ding, C., Liu, L. et al. Cacheap: Portable and Collaborative I/O Optimization for Graph Processing. J. Comput. Sci. Technol. 34, 690–706 (2019). https://doi.org/10.1007/s11390-019-1936-6

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-019-1936-6

Keywords

Navigation