Abstract
Graph processing is widely used in modern society, such as social networks, bioinformatics, and information networks. It is observed that the dataflow architecture has been demonstrated to effectively resolve the challenges of low instruction-level parallelism and branch mispredictions in the existing general-purpose architecture for graph applications. In this paper, toward a customized heterogeneous dataflow architecture that integrates the hardware advantages of both dataflow architecture and traditional control architecture, we propose a novel runtime system that can adaptively offload each subgraph to an appropriate underlying architecture. We also present a hybrid execution model to drive optimal performance. Our implementation on a CPU-FPGA platform shows that our approach achieves 2.2x throughput improvement over a state-of-art CPU-FPGA graph processing accelerator and 2.4x throughput improvement over a state-of-art FPGA-based design.
Similar content being viewed by others
References
Ahn, J., Hong, S., Yoo, S., Mutlu, O., Choi, K.: A scalable processing-in-memory accelerator for parallel graph processing. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA), pp. 105–117 (2015)
Attia, O. G., Johnson, T., Townsend, K., Jones, P., Zambreno, J.: Cygraph: A reconfigurable architecture for parallel breadth-first search. In: Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, pp. 228–235. IEEE (2014)
Beamer, S., Asanovic, K., Patterson, D.: Locality exists in graph processing: Workload characterization on an ivy bridge server. In: Proceedings of the IEEE International Symposium on Workload Characterization, pp. 56–65. IEEE (2015)
Chen, R., Shi, J., Chen, Y., Zang, B., Guan, H., Chen, H.: Powerlyra: Differentiated graph computation and partitioning on skewed graphs. ACM Trans. Parallel Comput. 5(3), 1–39 (2019)
Cheng, Y., Wang, F., Jiang, H., Hua, Y., Feng, D., Zhang, L., Zhou, J.: A communication-reduced and computation-balanced framework for fast graph computation. Front. Comput. Sci. 12(5), 887–907 (2018)
Cong, G., Makarychev, K.: Optimizing large-scale graph analysis on multithreaded, multicore platforms. In: Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 414–425. IEEE (2012)
Dai, G., Huang, T., Chi, Y., Xu, N., Wang, Y., Yang, H.: Foregraph: Exploring large-scale graph processing on multi-FPGA architecture. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp. 217–226 (2017)
Dang, H. V., Dathathri, R., Gill, G., Brooks, A., Dryden, N., Lenharth, A., Hoang, L., Pingali, K., Snir, M.: A lightweight communication runtime for distributed graph analytics. In: Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 980–989. IEEE (2018)
Gonzalez, J. E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: Distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 17–30 (2012)
Graph500: Graph 500 benchmark (2010). http://graph500.org/. Accessed 5 April 2020
Ham, T. J., Wu, L., Sundaram, N., Satish, N., Martonosi, M.: Graphicionado: A high-performance and energy-efficient accelerator for graph analytics. In: Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 1–13. IEEE (2016)
Ho, C. H., Kim, S. J., Sankaralingam, K.: Efficient execution of memory access phases using dataflow specialization. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA), pp. 118–130 (2015)
Hong, S., Oguntebi, T., Olukotun, K.: Efficient parallel graph exploration on multi-core CPU and GPU. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 78–88. IEEE (2011)
Jin, H., Yao, P., Liao, X.: Towards dataflow based graph processing. Sci. China Inform. Scie. 60(12), 126102 (2017)
Kyrola, A., Blelloch, G., Guestrin, C.: Graphchi: Large-scale graph computation on just a pc. In: Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 31–46 (2012)
Li, A., Li, X., Pan, Y., Zhang, W.: Strategies for network security. Sci. China Inform. Sci. 58(1), 1–14 (2015)
Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 135–146 (2010)
Nai, L., Xia, Y., Tanase, I. G., Kim, H., Lin, C.-Y.: Graphbig: understanding graph computing in the context of industrial solutions. In: Proceedings of the IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–12. IEEE (2015)
Nowatzki, T., Gangadhar, V., Sankaralingam, K.: Exploring the potential of heterogeneous von neumann/dataflow execution models. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA), pp. 298–310 (2015)
Oguntebi, T., Olukotun, K.: Graphops: A dataflow library for graph analytics acceleration. In: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp. 111–117 (2016)
Ozdal, M.M., Yesil, S., Kim, T., Ayupov, A., Greth, J., Burns, S., Ozturk, O.: Energy efficient architecture for graph analytics accelerators. ACM SIGARCH Comput. Architect. News 44(3), 166–177 (2016)
Park, J., Chao, H., Arabnia, H., Yen, N.Y.: Advanced multimedia and ubiquitous engineering. Future Information Technology 2, (2015)
Remis, L., Garzaran, M. J., Asenjo, R., Navarro, A.: Breadth-first search on heterogeneous platforms: A case of study on social networks. In: Proceedings of the 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp. 118–125. IEEE (2016)
Roy, A., Mihailovic, I., Zwaenepoel, W.: X-stream: Edge-centric graph processing using streaming partitions. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP), pp. 472–488 (2013)
Sankaralingam, K., Nagarajan, R., McDonald, R., Desikan, R., Drolia, S., Govindan, M., Gratz, P., Gulati, D., Hanson, H., Kim, C., Liu, H., Ranganathan, N., Sethumadhavan, S., Sharif, S., Shivakumar, P., Keckler, S. W., Burger, D.: Distributed microarchitectural protocols in the trips prototype processor. In: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 480–491. IEEE (2006)
Shun, J., Blelloch, G. E.: Ligra: a lightweight graph processing framework for shared memory. In: Proceedings of the 18th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming (PPoPP), pp. 135–146 (2013)
Sundaram, N., Satish, N. R., Patwary, M. M. A., Dulloor, S. R., Vadlamudi, S. G., Das, D., Dubey, P.: Graphmat: High performance graph analytics made productive. arXiv preprint arXiv:1503.07241 (2015)
Teixeira, C. H., Fonseca, A. J., Serafini, M., Siganos, G., Zaki, M. J., Aboulnaga, A.: Arabesque: a system for distributed graph mining. In: Proceedings of the 25th Symposium on Operating Systems Principles (SOSP), pp. 425–440 (2015)
Umuroglu, Y., Morrison, D., Jahre, M.: Hybrid breadth-first search on a single-chip FPGA-CPU heterogeneous platform. In: Proceedings of the 25th International Conference on Field Programmable Logic and Applications (FPL), pp. 1–8. IEEE (2015)
Yang, C., Zheng, L., Gui, C., Jin, H.: Efficient FPGA-based graph processing with hybrid pull-push computational model. Front. Comput.Sci. 14(4), 1–16 (2020)
Yuan, P., Zhang, W., Xie, C., Jin, H., Liu, L., Lee, K.: Fast iterative graph computation: A path centric approach. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 401–412. IEEE (2014)
Zhang, K., Chen, R., Chen, H.: NUMA-aware graph-structured analytics. In: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pp. 183–193 (2015)
Zhou, S., Prasanna, V. K.: Accelerating graph analytics on CPU-FPGA heterogeneous platform. In: Proceedings of the 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp. 137–144. IEEE (2017)
Zhu, X., Han, W., Chen, W.: Gridgraph: Large-scale graph processing on a single machine using 2-level hierarchical partitioning. In: Proceedings of the USENIX Annual Technical Conference (USENIX ATC), pp. 375–386 (2015)
Acknowledgements
We would like to thank the anonymous reviewers for their insightful comments and valuable feedback. This work is supported by the National Key Research and Development Program of China under Grant No. 2018YFB1003502, National Natural Science Foundation of China under Grant No. 61832006, 61825202, 61702201, and 61929103.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, Q., Zheng, L., Liao, X. et al. Effective runtime scheduling for high-performance graph processing on heterogeneous dataflow architecture. CCF Trans. HPC 2, 362–375 (2020). https://doi.org/10.1007/s42514-020-00041-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42514-020-00041-w