Skip to main content

Advertisement

Log in

Effective runtime scheduling for high-performance graph processing on heterogeneous dataflow architecture

  • Regular Paper
  • Published:
CCF Transactions on High Performance Computing Aims and scope Submit manuscript

Abstract

Graph processing is widely used in modern society, such as social networks, bioinformatics, and information networks. It is observed that the dataflow architecture has been demonstrated to effectively resolve the challenges of low instruction-level parallelism and branch mispredictions in the existing general-purpose architecture for graph applications. In this paper, toward a customized heterogeneous dataflow architecture that integrates the hardware advantages of both dataflow architecture and traditional control architecture, we propose a novel runtime system that can adaptively offload each subgraph to an appropriate underlying architecture. We also present a hybrid execution model to drive optimal performance. Our implementation on a CPU-FPGA platform shows that our approach achieves 2.2x throughput improvement over a state-of-art CPU-FPGA graph processing accelerator and 2.4x throughput improvement over a state-of-art FPGA-based design.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  • Ahn, J., Hong, S., Yoo, S., Mutlu, O., Choi, K.: A scalable processing-in-memory accelerator for parallel graph processing. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA), pp. 105–117 (2015)

  • Attia, O. G., Johnson, T., Townsend, K., Jones, P., Zambreno, J.: Cygraph: A reconfigurable architecture for parallel breadth-first search. In: Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, pp. 228–235. IEEE (2014)

  • Beamer, S., Asanovic, K., Patterson, D.: Locality exists in graph processing: Workload characterization on an ivy bridge server. In: Proceedings of the IEEE International Symposium on Workload Characterization, pp. 56–65. IEEE (2015)

  • Chen, R., Shi, J., Chen, Y., Zang, B., Guan, H., Chen, H.: Powerlyra: Differentiated graph computation and partitioning on skewed graphs. ACM Trans. Parallel Comput. 5(3), 1–39 (2019)

    Article  Google Scholar 

  • Cheng, Y., Wang, F., Jiang, H., Hua, Y., Feng, D., Zhang, L., Zhou, J.: A communication-reduced and computation-balanced framework for fast graph computation. Front. Comput. Sci. 12(5), 887–907 (2018)

    Article  Google Scholar 

  • Cong, G., Makarychev, K.: Optimizing large-scale graph analysis on multithreaded, multicore platforms. In: Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 414–425. IEEE (2012)

  • Dai, G., Huang, T., Chi, Y., Xu, N., Wang, Y., Yang, H.: Foregraph: Exploring large-scale graph processing on multi-FPGA architecture. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp. 217–226 (2017)

  • Dang, H. V., Dathathri, R., Gill, G., Brooks, A., Dryden, N., Lenharth, A., Hoang, L., Pingali, K., Snir, M.: A lightweight communication runtime for distributed graph analytics. In: Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 980–989. IEEE (2018)

  • Gonzalez, J. E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: Distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 17–30 (2012)

  • Graph500: Graph 500 benchmark (2010). http://graph500.org/. Accessed 5 April 2020

  • Ham, T. J., Wu, L., Sundaram, N., Satish, N., Martonosi, M.: Graphicionado: A high-performance and energy-efficient accelerator for graph analytics. In: Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 1–13. IEEE (2016)

  • Ho, C. H., Kim, S. J., Sankaralingam, K.: Efficient execution of memory access phases using dataflow specialization. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA), pp. 118–130 (2015)

  • Hong, S., Oguntebi, T., Olukotun, K.: Efficient parallel graph exploration on multi-core CPU and GPU. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 78–88. IEEE (2011)

  • Jin, H., Yao, P., Liao, X.: Towards dataflow based graph processing. Sci. China Inform. Scie. 60(12), 126102 (2017)

    Article  Google Scholar 

  • Kyrola, A., Blelloch, G., Guestrin, C.: Graphchi: Large-scale graph computation on just a pc. In: Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 31–46 (2012)

  • Li, A., Li, X., Pan, Y., Zhang, W.: Strategies for network security. Sci. China Inform. Sci. 58(1), 1–14 (2015)

    Article  Google Scholar 

  • Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 135–146 (2010)

  • Nai, L., Xia, Y., Tanase, I. G., Kim, H., Lin, C.-Y.: Graphbig: understanding graph computing in the context of industrial solutions. In: Proceedings of the IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–12. IEEE (2015)

  • Nowatzki, T., Gangadhar, V., Sankaralingam, K.: Exploring the potential of heterogeneous von neumann/dataflow execution models. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA), pp. 298–310 (2015)

  • Oguntebi, T., Olukotun, K.: Graphops: A dataflow library for graph analytics acceleration. In: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp. 111–117 (2016)

  • Ozdal, M.M., Yesil, S., Kim, T., Ayupov, A., Greth, J., Burns, S., Ozturk, O.: Energy efficient architecture for graph analytics accelerators. ACM SIGARCH Comput. Architect. News 44(3), 166–177 (2016)

    Article  Google Scholar 

  • Park, J., Chao, H., Arabnia, H., Yen, N.Y.: Advanced multimedia and ubiquitous engineering. Future Information Technology 2, (2015)

  • Remis, L., Garzaran, M. J., Asenjo, R., Navarro, A.: Breadth-first search on heterogeneous platforms: A case of study on social networks. In: Proceedings of the 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp. 118–125. IEEE (2016)

  • Roy, A., Mihailovic, I., Zwaenepoel, W.: X-stream: Edge-centric graph processing using streaming partitions. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP), pp. 472–488 (2013)

  • Sankaralingam, K., Nagarajan, R., McDonald, R., Desikan, R., Drolia, S., Govindan, M., Gratz, P., Gulati, D., Hanson, H., Kim, C., Liu, H., Ranganathan, N., Sethumadhavan, S., Sharif, S., Shivakumar, P., Keckler, S. W., Burger, D.: Distributed microarchitectural protocols in the trips prototype processor. In: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 480–491. IEEE (2006)

  • Shun, J., Blelloch, G. E.: Ligra: a lightweight graph processing framework for shared memory. In: Proceedings of the 18th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming (PPoPP), pp. 135–146 (2013)

  • Sundaram, N., Satish, N. R., Patwary, M. M. A., Dulloor, S. R., Vadlamudi, S. G., Das, D., Dubey, P.: Graphmat: High performance graph analytics made productive. arXiv preprint arXiv:1503.07241 (2015)

  • Teixeira, C. H., Fonseca, A. J., Serafini, M., Siganos, G., Zaki, M. J., Aboulnaga, A.: Arabesque: a system for distributed graph mining. In: Proceedings of the 25th Symposium on Operating Systems Principles (SOSP), pp. 425–440 (2015)

  • Umuroglu, Y., Morrison, D., Jahre, M.: Hybrid breadth-first search on a single-chip FPGA-CPU heterogeneous platform. In: Proceedings of the 25th International Conference on Field Programmable Logic and Applications (FPL), pp. 1–8. IEEE (2015)

  • Yang, C., Zheng, L., Gui, C., Jin, H.: Efficient FPGA-based graph processing with hybrid pull-push computational model. Front. Comput.Sci. 14(4), 1–16 (2020)

    Article  Google Scholar 

  • Yuan, P., Zhang, W., Xie, C., Jin, H., Liu, L., Lee, K.: Fast iterative graph computation: A path centric approach. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 401–412. IEEE (2014)

  • Zhang, K., Chen, R., Chen, H.: NUMA-aware graph-structured analytics. In: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pp. 183–193 (2015)

  • Zhou, S., Prasanna, V. K.: Accelerating graph analytics on CPU-FPGA heterogeneous platform. In: Proceedings of the 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp. 137–144. IEEE (2017)

  • Zhu, X., Han, W., Chen, W.: Gridgraph: Large-scale graph processing on a single machine using 2-level hierarchical partitioning. In: Proceedings of the USENIX Annual Technical Conference (USENIX ATC), pp. 375–386 (2015)

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their insightful comments and valuable feedback. This work is supported by the National Key Research and Development Program of China under Grant No. 2018YFB1003502, National Natural Science Foundation of China under Grant No. 61832006, 61825202, 61702201, and 61929103.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hai Jin.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Q., Zheng, L., Liao, X. et al. Effective runtime scheduling for high-performance graph processing on heterogeneous dataflow architecture. CCF Trans. HPC 2, 362–375 (2020). https://doi.org/10.1007/s42514-020-00041-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42514-020-00041-w

Keywords

Navigation