Effective runtime scheduling for high-performance graph processing on heterogeneous dataflow architecture

Chen, Qingxiang; Zheng, Long; Liao, Xiaofei; Jin, Hai; Wang, Qinggang

doi:10.1007/s42514-020-00041-w

Effective runtime scheduling for high-performance graph processing on heterogeneous dataflow architecture

Regular Paper
Published: 28 July 2020

Volume 2, pages 362–375, (2020)
Cite this article

CCF Transactions on High Performance Computing Aims and scope Submit manuscript

Qingxiang Chen¹,
Long Zheng¹,
Xiaofei Liao¹,
Hai Jin ORCID: orcid.org/0000-0002-3934-7605¹ &
…
Qinggang Wang¹

677 Accesses
1 Citation
Explore all metrics

Abstract

Graph processing is widely used in modern society, such as social networks, bioinformatics, and information networks. It is observed that the dataflow architecture has been demonstrated to effectively resolve the challenges of low instruction-level parallelism and branch mispredictions in the existing general-purpose architecture for graph applications. In this paper, toward a customized heterogeneous dataflow architecture that integrates the hardware advantages of both dataflow architecture and traditional control architecture, we propose a novel runtime system that can adaptively offload each subgraph to an appropriate underlying architecture. We also present a hybrid execution model to drive optimal performance. Our implementation on a CPU-FPGA platform shows that our approach achieves 2.2x throughput improvement over a state-of-art CPU-FPGA graph processing accelerator and 2.4x throughput improvement over a state-of-art FPGA-based design.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving Utilization of Dataflow Architectures Through Software and Hardware Co-Design

Distributed large-scale graph processing on FPGAs

Article Open access 04 June 2023

OneGraph: a cross-architecture framework for large-scale graph computing on GPUs based on oneAPI

Article 09 November 2023

References

Ahn, J., Hong, S., Yoo, S., Mutlu, O., Choi, K.: A scalable processing-in-memory accelerator for parallel graph processing. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA), pp. 105–117 (2015)
Attia, O. G., Johnson, T., Townsend, K., Jones, P., Zambreno, J.: Cygraph: A reconfigurable architecture for parallel breadth-first search. In: Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, pp. 228–235. IEEE (2014)
Beamer, S., Asanovic, K., Patterson, D.: Locality exists in graph processing: Workload characterization on an ivy bridge server. In: Proceedings of the IEEE International Symposium on Workload Characterization, pp. 56–65. IEEE (2015)
Chen, R., Shi, J., Chen, Y., Zang, B., Guan, H., Chen, H.: Powerlyra: Differentiated graph computation and partitioning on skewed graphs. ACM Trans. Parallel Comput. 5(3), 1–39 (2019)
Article Google Scholar
Cheng, Y., Wang, F., Jiang, H., Hua, Y., Feng, D., Zhang, L., Zhou, J.: A communication-reduced and computation-balanced framework for fast graph computation. Front. Comput. Sci. 12(5), 887–907 (2018)
Article Google Scholar
Cong, G., Makarychev, K.: Optimizing large-scale graph analysis on multithreaded, multicore platforms. In: Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 414–425. IEEE (2012)
Dai, G., Huang, T., Chi, Y., Xu, N., Wang, Y., Yang, H.: Foregraph: Exploring large-scale graph processing on multi-FPGA architecture. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp. 217–226 (2017)
Dang, H. V., Dathathri, R., Gill, G., Brooks, A., Dryden, N., Lenharth, A., Hoang, L., Pingali, K., Snir, M.: A lightweight communication runtime for distributed graph analytics. In: Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 980–989. IEEE (2018)
Gonzalez, J. E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: Distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 17–30 (2012)
Graph500: Graph 500 benchmark (2010). http://graph500.org/. Accessed 5 April 2020
Ham, T. J., Wu, L., Sundaram, N., Satish, N., Martonosi, M.: Graphicionado: A high-performance and energy-efficient accelerator for graph analytics. In: Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 1–13. IEEE (2016)
Ho, C. H., Kim, S. J., Sankaralingam, K.: Efficient execution of memory access phases using dataflow specialization. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA), pp. 118–130 (2015)
Hong, S., Oguntebi, T., Olukotun, K.: Efficient parallel graph exploration on multi-core CPU and GPU. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 78–88. IEEE (2011)
Jin, H., Yao, P., Liao, X.: Towards dataflow based graph processing. Sci. China Inform. Scie. 60(12), 126102 (2017)
Article Google Scholar
Kyrola, A., Blelloch, G., Guestrin, C.: Graphchi: Large-scale graph computation on just a pc. In: Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 31–46 (2012)
Li, A., Li, X., Pan, Y., Zhang, W.: Strategies for network security. Sci. China Inform. Sci. 58(1), 1–14 (2015)
Article Google Scholar
Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 135–146 (2010)
Nai, L., Xia, Y., Tanase, I. G., Kim, H., Lin, C.-Y.: Graphbig: understanding graph computing in the context of industrial solutions. In: Proceedings of the IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–12. IEEE (2015)
Nowatzki, T., Gangadhar, V., Sankaralingam, K.: Exploring the potential of heterogeneous von neumann/dataflow execution models. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA), pp. 298–310 (2015)
Oguntebi, T., Olukotun, K.: Graphops: A dataflow library for graph analytics acceleration. In: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp. 111–117 (2016)
Ozdal, M.M., Yesil, S., Kim, T., Ayupov, A., Greth, J., Burns, S., Ozturk, O.: Energy efficient architecture for graph analytics accelerators. ACM SIGARCH Comput. Architect. News 44(3), 166–177 (2016)
Article Google Scholar
Park, J., Chao, H., Arabnia, H., Yen, N.Y.: Advanced multimedia and ubiquitous engineering. Future Information Technology 2, (2015)
Remis, L., Garzaran, M. J., Asenjo, R., Navarro, A.: Breadth-first search on heterogeneous platforms: A case of study on social networks. In: Proceedings of the 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp. 118–125. IEEE (2016)
Roy, A., Mihailovic, I., Zwaenepoel, W.: X-stream: Edge-centric graph processing using streaming partitions. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP), pp. 472–488 (2013)
Sankaralingam, K., Nagarajan, R., McDonald, R., Desikan, R., Drolia, S., Govindan, M., Gratz, P., Gulati, D., Hanson, H., Kim, C., Liu, H., Ranganathan, N., Sethumadhavan, S., Sharif, S., Shivakumar, P., Keckler, S. W., Burger, D.: Distributed microarchitectural protocols in the trips prototype processor. In: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 480–491. IEEE (2006)
Shun, J., Blelloch, G. E.: Ligra: a lightweight graph processing framework for shared memory. In: Proceedings of the 18th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming (PPoPP), pp. 135–146 (2013)
Sundaram, N., Satish, N. R., Patwary, M. M. A., Dulloor, S. R., Vadlamudi, S. G., Das, D., Dubey, P.: Graphmat: High performance graph analytics made productive. arXiv preprint arXiv:1503.07241 (2015)
Teixeira, C. H., Fonseca, A. J., Serafini, M., Siganos, G., Zaki, M. J., Aboulnaga, A.: Arabesque: a system for distributed graph mining. In: Proceedings of the 25th Symposium on Operating Systems Principles (SOSP), pp. 425–440 (2015)
Umuroglu, Y., Morrison, D., Jahre, M.: Hybrid breadth-first search on a single-chip FPGA-CPU heterogeneous platform. In: Proceedings of the 25th International Conference on Field Programmable Logic and Applications (FPL), pp. 1–8. IEEE (2015)
Yang, C., Zheng, L., Gui, C., Jin, H.: Efficient FPGA-based graph processing with hybrid pull-push computational model. Front. Comput.Sci. 14(4), 1–16 (2020)
Article Google Scholar
Yuan, P., Zhang, W., Xie, C., Jin, H., Liu, L., Lee, K.: Fast iterative graph computation: A path centric approach. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 401–412. IEEE (2014)
Zhang, K., Chen, R., Chen, H.: NUMA-aware graph-structured analytics. In: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pp. 183–193 (2015)
Zhou, S., Prasanna, V. K.: Accelerating graph analytics on CPU-FPGA heterogeneous platform. In: Proceedings of the 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp. 137–144. IEEE (2017)
Zhu, X., Han, W., Chen, W.: Gridgraph: Large-scale graph processing on a single machine using 2-level hierarchical partitioning. In: Proceedings of the USENIX Annual Technical Conference (USENIX ATC), pp. 375–386 (2015)

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their insightful comments and valuable feedback. This work is supported by the National Key Research and Development Program of China under Grant No. 2018YFB1003502, National Natural Science Foundation of China under Grant No. 61832006, 61825202, 61702201, and 61929103.

Author information

Authors and Affiliations

National Engineering Research Center for Big Data Technology and System/Service Computing Technology and System Lab/Cluster and Grid Computing Lab, Huazhong University of Science and Technology, Wuhan, 430074, China
Qingxiang Chen, Long Zheng, Xiaofei Liao, Hai Jin & Qinggang Wang

Authors

Qingxiang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Long Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofei Liao
View author publications
You can also search for this author in PubMed Google Scholar
Hai Jin
View author publications
You can also search for this author in PubMed Google Scholar
Qinggang Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hai Jin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Q., Zheng, L., Liao, X. et al. Effective runtime scheduling for high-performance graph processing on heterogeneous dataflow architecture. CCF Trans. HPC 2, 362–375 (2020). https://doi.org/10.1007/s42514-020-00041-w

Download citation

Received: 17 February 2020
Accepted: 02 July 2020
Published: 28 July 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s42514-020-00041-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effective runtime scheduling for high-performance graph processing on heterogeneous dataflow architecture

Abstract

Access this article

Similar content being viewed by others

Improving Utilization of Dataflow Architectures Through Software and Hardware Co-Design

Distributed large-scale graph processing on FPGAs

OneGraph: a cross-architecture framework for large-scale graph computing on GPUs based on oneAPI

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Effective runtime scheduling for high-performance graph processing on heterogeneous dataflow architecture

Abstract

Access this article

Similar content being viewed by others

Improving Utilization of Dataflow Architectures Through Software and Hardware Co-Design

Distributed large-scale graph processing on FPGAs

OneGraph: a cross-architecture framework for large-scale graph computing on GPUs based on oneAPI

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation